WO2012060172A1 - Movie image encoding device, movie image decoding device, movie image transmitting system, method of controlling movie image encoding device, method of controlling movie image decoding device, movie image encoding device controlling program, movie image decoding device controlling program, and recording medium - Google Patents
Movie image encoding device, movie image decoding device, movie image transmitting system, method of controlling movie image encoding device, method of controlling movie image decoding device, movie image encoding device controlling program, movie image decoding device controlling program, and recording medium Download PDFInfo
- Publication number
- WO2012060172A1 WO2012060172A1 PCT/JP2011/072291 JP2011072291W WO2012060172A1 WO 2012060172 A1 WO2012060172 A1 WO 2012060172A1 JP 2011072291 W JP2011072291 W JP 2011072291W WO 2012060172 A1 WO2012060172 A1 WO 2012060172A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- encoding
- static
- decoding
- encoded
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
- H03M7/3088—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing the use of a dictionary, e.g. LZ78
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
Definitions
- the present invention relates to a moving image encoding apparatus that encodes a distance video, a control method of the moving image encoding apparatus, a moving image encoding apparatus control program, and a moving image decoding that decodes encoded data encoded by these
- the present invention relates to an apparatus, a video decoding device control method, a video decoding device control program, a video transmission system including the same, and a recording medium.
- the subject can be recognized in three dimensions by viewing the image for the left eye with the left eye and the image for the right eye with the right eye, so two images corresponding to the left and right eyes are necessary. Become.
- a normal two-dimensional image texture image
- a distance image that is an image representing the distance from the camera to the subject.
- the distance image is an image expressing the distance from the camera to the subject for each pixel for all the subjects in the image.
- the distance from the camera to the subject can be obtained by a distance measuring device installed in the vicinity of the camera, or by analyzing texture images taken by the camera from two or more viewpoints. can do.
- MPEG Moving Depth Experts Group
- ISO / IEC International Electrotechnical Commission
- the distance image is an image expressed in 8-bit gray scale.
- a higher brightness is assigned as the distance is shorter, so that the subject closer to the camera becomes whiter and the subject farther away becomes blacker.
- the distance for each pixel of the subject reflected in the texture image can be known, so that the subject can be restored to a three-dimensional shape in 256 stages.
- the texture image can be converted into the texture image from the other viewpoint.
- the occurrence of occlusion is prevented by using a plurality of viewpoint images.
- the texture from the viewpoint A is projected and converted to the texture image from the virtual viewpoint B
- the texture from the virtual viewpoint B is similarly applied from the texture image from the viewpoint C, which is a viewpoint different from A. Project to an image.
- two images from the same virtual viewpoint B can be created, and the blind image is different between the texture image from the viewpoint A and the texture image from the viewpoint C. Therefore, the occlusion in the image from one viewpoint is reduced from the other viewpoint. It can be supplemented by images.
- occlusion can be supplemented for the projection conversion to the virtual viewpoint on the line connecting the viewpoint A and the viewpoint C, and an image from the virtual viewpoint on the line can be created.
- Non-Patent Document 1 discloses a method of compressing a video from a plurality of viewpoints by efficiently eliminating the redundancy of a video between a plurality of viewpoints (video having an image as each frame). By applying this to two groups of multiple texture images and multiple distance images, it becomes possible to eliminate redundancy between texture images and between distance images. Transmission data can be compressed.
- a distance video at a specific viewpoint is created by performing the above-described projection conversion on a distance video at a certain viewpoint, thereby creating a distance video at a specific viewpoint and removing holes in the created distance video.
- a method for generating a video is disclosed.
- JP 2009-105894 A Japanese Patent Publication “JP 2009-105894 A (published May 14, 2009)”
- the distance image represents the distance from the camera to the subject as discrete values step by step for each pixel, and has the following characteristics.
- the first feature is that the edge portion of the subject is in common with the texture image. That is, as long as the texture image includes information that can distinguish the subject and the background as an image, the boundary (edge) between the subject and the background is common to the texture image and the distance image. Therefore, the edge information of the subject is one of the large elements of the correlation information between the texture image and the distance image.
- the second feature is that the distance depth value is relatively flat in the portion inside the edge of the subject.
- the texture image shows information about the clothes worn by the person, but the distance image does not show the clothes pattern information, but only the depth information. Is done. For this reason, the distance depth value on the same subject is flat or changes more slowly than the texture image.
- the pixels are divided for each range where the distance / depth value is constant, the distance / depth value is constant within that range, so very efficient coding is performed without performing orthogonal transformation or the like. Can be performed. Furthermore, if the range to be divided is determined based on some rule in the texture image, it is not necessary to transmit information regarding the divided range, and the coding efficiency can be further improved.
- the pixel group included in the range divided based on the distance depth value is called a segment. Since the coding efficiency can be improved as the number of segments is smaller, the shape of the segment is not limited, and the coding efficiency can be further improved by using a flexible shape.
- Non-Patent Document 1 the compression of the texture image is promoted by reusing square segments between temporally adjacent frames or between different viewpoint images. More specifically, in Non-Patent Document 1, by using a motion compensation vector or a parallax compensation vector, an image is divided into square segments (blocks), and between temporally adjacent frames or between different viewpoint images. Data is compressed by reusing blocks.
- Non-Patent Document 1 when the method described in Non-Patent Document 1 is applied to the distance image divided into the flexible segment shapes described above, the encoding efficiency is extremely deteriorated. This is because the method described in Patent Document 1 is a method suitable for a method in which segments are square and each segment is orthogonally transformed. When the segments are made flexible, the vector information to be transmitted becomes enormous. It is because it ends.
- Patent Document 1 substitutes a single viewpoint for a distance image of a plurality of viewpoints, an error becomes large and the quality is greatly deteriorated.
- the present invention has been made in view of the above problems, and an object of the present invention is to realize a moving picture encoding apparatus and the like that can select an encoding method.
- a moving image encoding device is a moving image encoding device that encodes a moving image, in which each frame image of the moving image is divided into a plurality of regions.
- the coded data is generated by performing at least one of the static coding in which codewords having different numbers of bits are assigned to each representative value according to the appearance rate of the representative value in the frame image and coded.
- Do Encoding means and an encoding method selection means for selecting either the adaptive encoding or the static encoding for each frame image, wherein the encoding means selects the encoding method
- the frame image is encoded using the encoding method selected by the means to generate encoded data.
- control method of the moving image encoding device is a control method of the moving image encoding device for encoding a moving image, and each frame image of the moving image is encoded by the moving image encoding device.
- An image dividing step for dividing the image into a plurality of regions, a representative value determining step for determining a representative value of each region divided in the image dividing step, and a representative value determined in the representative value determining step are arranged in a predetermined order.
- the sequence is adaptively encoded by adaptively updating and encoding an adaptive codebook in which a sequence pattern and a code word are associated, and the representative values determined in the representative value determination step are arranged in a predetermined order, Coding method selection for selecting each of the frame images for each of the frame images, in which static coding is performed by allocating code words having different numbers of bits depending on the appearance rate of the representative value in the frame image. And at least one of the adaptive encoding and the static encoding, and the frame image is encoded using the encoding method selected in the encoding method selection step. And an encoding step for generating encoded data.
- each frame image of a moving image is divided into a plurality of regions, and a representative value of each divided region is determined. Then, for each frame image, the representative positions are arranged in a predetermined order, and it is selected whether the encoding is performed by adaptive encoding or static encoding. For each frame image, encoding is performed using the selected encoding method.
- the predetermined order is an order in which the position corresponding to the representative value can be specified in the frame image.
- the order in which any pixel included in each region is first scanned when the frame image is raster scanned can be set as a predetermined order.
- the encoding method with the smaller amount of information after encoding is selected, more compressed encoded data can be generated. Further, if an encoding method with few processing procedures is selected, more efficient encoding can be performed.
- the moving picture decoding apparatus divides each frame image of a moving picture into a plurality of areas, and a sequence pattern for a number sequence in which representative values of each area are arranged in a predetermined order.
- adaptive coding for adaptively updating and coding a codebook in which codewords are associated with each other or arranging the representative values in a predetermined order, and each representative value is represented by the representative value in the frame image.
- a moving image decoding apparatus for decoding image encoded data which is data encoded by any one of static encoding in which codewords having different numbers of bits depending on the appearance rate are allocated, and the image encoded data And an acquisition unit that acquires encoding information that is information indicating an encoding method of the encoded image data, and a decoding method corresponding to the encoding method indicated by the encoding information acquired by the acquisition unit, and the frame image
- decoding means for decoding the encoded image data to generate decoded data, decoded data generated by the decoding means, and information indicating the region
- an image generating means for generating each frame image.
- control method of the moving picture decoding apparatus divides each frame image of a moving picture into a plurality of areas, and a number sequence pattern and codeword for a number sequence in which representative values of each area are arranged in a predetermined order.
- adaptive coding that adaptively updates and encodes the codebook, or the representative values are arranged in a predetermined order, and each representative value is represented by the appearance rate of the representative value in the frame image.
- a method for controlling a moving picture decoding apparatus that decodes picture encoded data that is data encoded by any one of static encodings in which codewords having different bit numbers are allocated and encoded, the moving picture decoding apparatus
- You In the decoding method for each of the image encoded data corresponding to the frame image, a decoding step for decoding the image encoded data to generate decoded data, decoded data generated in the decoding step, and the region And an image generation step of generating each frame image of the moving image from the information.
- each frame image of a moving image is divided into a plurality of regions, and a sequence pattern and a code word are associated with a sequence of numbers in which representative values of each region are arranged in a predetermined order.
- Adaptive coding for adaptively updating and coding a codebook, or a code in which the representative values are arranged in a predetermined order, and the number of bits varies depending on the appearance rate of the representative value in the frame image.
- Image encoded data which is data encoded by any one of static encoding in which words are allocated and encoded, is decoded. Then, an image is generated from the decoded data and the information indicating the area.
- adaptive decoding is performed for the adaptively encoded image encoded data, and static decoding is performed for the statically encoded image encoded data. As described above, decoding can be performed appropriately.
- the moving image encoding device and the moving image decoding device may be realized by a computer.
- the moving image encoding device and the moving image decoding device are operated by causing the computer to operate as the respective means.
- a video encoding device and a video decoding device control program realized by a computer and a computer-readable recording medium on which the control program is recorded also fall within the scope of the present invention.
- the moving image encoding apparatus includes an image dividing unit that divides each frame image of a moving image into a plurality of regions, and a representative that determines a representative value of each region divided by the image dividing unit.
- An adaptive codebook in which a numerical sequence in which the representative values determined by the representative value determining unit are arranged in a predetermined order is associated with a sequence pattern and a codeword is updated adaptively for each frame image.
- the representative values determined by the representative value determining means are arranged in a predetermined order for each frame image, and each representative value is represented by the appearance rate of the representative value in the frame image.
- Encoding means for generating encoded data by performing at least one of static encoding that allocates and encodes codewords having different bit numbers, and the adaptive encoding and the above for each frame image
- Encoding method selection means for selecting any one of the encoding methods, and the encoding means encodes the frame image using the encoding method selected by the encoding method selection means, This is a configuration for generating encoded data.
- control method of the moving image encoding device is a control method of the moving image encoding device for encoding a moving image, and each frame image of the moving image is encoded by the moving image encoding device.
- An image dividing step for dividing the image into a plurality of regions, a representative value determining step for determining a representative value of each region divided in the image dividing step, and a representative value determined in the representative value determining step are arranged in a predetermined order.
- the sequence is adaptively encoded by adaptively updating and encoding an adaptive codebook in which a sequence pattern and a code word are associated, and the representative values determined in the representative value determination step are arranged in a predetermined order, Coding method selection for selecting each of the frame images for each of the frame images, in which static coding is performed by allocating code words having different numbers of bits depending on the appearance rate of the representative value in the frame image. And at least one of the adaptive encoding and the static encoding, and the frame image is encoded using the encoding method selected in the encoding method selection step. And an encoding step of generating encoded data.
- the decoding apparatus includes image acquisition data, acquisition means for acquiring encoding information that is information indicating an encoding method of the image encoding data, and encoding information acquired by the acquisition means.
- a decoding unit that decodes the encoded image data to generate decoded data for each of the encoded image data corresponding to the frame image in a decoding method corresponding to the encoding method shown in FIG.
- An image generating means for generating each frame image of the moving image from the data and information indicating the region is provided.
- the moving picture decoding apparatus control method provides the moving picture decoding apparatus that obtains the encoded image data and encoded information that is information indicating an encoding method of the encoded image data.
- the method includes: a decoding step to generate; an image generation step to generate each frame image of the moving image from the decoded data generated in the decoding step and information indicating the region.
- FIG. 1 illustrates an embodiment of the present invention and is a block diagram illustrating a main configuration of a moving image encoding device. It is a figure for demonstrating which picture a certain picture refers in AVC encoding. It is a figure for demonstrating the example which divides
- FIG. 3 is a block diagram illustrating a main configuration of a moving image decoding apparatus according to an embodiment of the present invention. It is a flowchart which shows operation
- FIG. 32 showing another embodiment of the present invention, is a block diagram illustrating a configuration of a main part of a video encoding device. It is a figure for demonstrating MVC encoding. It is a block diagram which shows the said other embodiment and shows the principal part structure of a moving image decoding apparatus. It is a flowchart figure which shows an example of the operation
- FIG. 30 shows the structure of the transmitter which mounts a moving image encoding apparatus.
- FIG. 30B is a block diagram showing a configuration of a receiving apparatus equipped with a moving picture decoding apparatus.
- FIG. 30 is a diagram for explaining that a moving image decoding apparatus and a moving image encoding apparatus can be used for recording and reproduction of moving images.
- FIG. 30A illustrates a recording apparatus in which the moving image encoding apparatus 2 is mounted.
- FIG. 30B is a block diagram showing the configuration of a playback apparatus equipped with a moving picture decoding apparatus.
- the moving picture coding apparatus 1 One embodiment of the present invention will be described below with reference to FIGS. First, the moving picture coding apparatus 1 according to the present embodiment will be described. Generally speaking, the moving image encoding apparatus 1 according to the present embodiment roughly describes a texture image and a distance image (each pixel value is expressed by a depth value) constituting each frame of each frame constituting the three-dimensional moving image. This is a device for generating encoded data by encoding (image).
- the moving image encoding apparatus 1 uses H.264 for encoding texture images.
- An image encoding device uses H.264 for encoding texture images.
- An image encoding device uses H.264 for encoding texture images.
- An image encoding device uses H.264 for encoding texture images.
- MPEG Motion Picture Expert
- the above encoding technique unique to the present invention is an encoding technique developed by paying attention to the fact that there is a correlation between a texture image and a distance image.
- the two images include information indicating the edge of the subject in the texture image
- the edge of the subject in the distance image is the same
- the pixel group included in the subject area is all or substantially all pixels. Are more likely to take the same distance value.
- FIG. 1 is a block diagram illustrating a main configuration of the moving image encoding device 1.
- the moving image encoding apparatus 1 includes an image encoding unit (AVC encoding means) 11, an image decoding unit 12, a distance image encoding unit 20, and a packaging unit 28.
- the distance image encoding unit 20 includes an image division processing unit 21, a distance image division processing unit (image dividing unit) 22, a distance value correcting unit 23, a numbering unit (representative value determining unit) 24, and a distance value encoding.
- Part encoding method selection means, encoding means, static codebook creation means) 25.
- the image encoding unit 11 The texture image # 1 is encoded by AVC (Advanced Video Coding) coding defined in the H.264 / MPEG-4 AVC standard.
- the encoded data (AVC encoded data) # 11 is output to the image decoding unit 12 and the packaging unit 28.
- the image encoding unit 11 also selects the type of the selected picture (predicted image) (described later), and the information for identifying the type of the selected picture and the reference picture (the selected picture is a P picture or Picture information # 11A indicating “in the case of a B picture” is output to the distance value encoding unit 25.
- the picture information # 11A includes information indicating the type of the selected picture.
- the image decoding unit 12 decodes the texture image # 1 ′ from the encoded data # 11 of the texture image # 1 acquired from the image encoding unit 11. Then, the texture image # 1 ′ is output to the image division processing unit 21.
- the decoded texture image # 1 ′ is the same as the image decoded by the moving image decoding device 2 on the receiving side when there is no bit error when transmitting from the moving image encoding device 1. This is different from the texture image # 1. This is because when the original texture image # 1 is AVC-encoded, a quantization error of an orthogonal transform coefficient applied in units called blocks that divide pixels into squares occurs.
- FIG. 2 is a diagram for explaining which picture a certain picture refers to in AVC coding.
- Each picture (201 to 209) in FIG. 2 constitutes a video in this order, and a picture (moving image) is obtained by switching the picture with time.
- a picture means one image (frame image) at a certain discrete time.
- AVC coding performs prediction before and after a picture in order to eliminate redundancy in the time direction.
- the prediction means that the screen is divided into square areas (blocks) of a certain size, and each area of the picture to be coded is the one close to the area in other pictures that are temporally related. To find out.
- I picture refers to a picture that is not predicted using another picture.
- a P picture is a picture that uses only a temporally forward picture for prediction.
- a B picture is a picture that uses both forward and backward pictures for prediction. Prediction is performed for each block. For example, in the case of a B picture, up to two pictures can be specified.
- the B picture 201 in FIG. 2 can be decoded only after the I picture 202 as the reference object arrives.
- the B picture 203 whose reference pictures are the I picture 202 and the P picture 205 can be decoded only after the P picture 205 arrives.
- the image decoding unit 12 can perform the decoding process on the B picture that refers to the subsequent picture only after the I picture or the P picture to be referenced arrives. In some cases, an I picture that is a later picture is first decoded and output to the image division processing unit 21.
- the image division processing unit 21 divides the entire area of the texture image into a plurality of segments (areas). Then, the image division processing unit 21 outputs segment information # 21 including position information of each segment to the distance image division processing unit 22.
- the segment position information is information indicating the position of the segment in the texture image # 1.
- the texture image is repeatedly smoothed while leaving edge information. Thereby, the noise which an image has can be removed. Then, adjacent similar colored segments are joined together. However, a segment whose width or height exceeds a predetermined number of pixels is likely to change a distance value within the segment, and is therefore divided so as not to exceed a predetermined number of pixels.
- the texture video can be divided into segment units.
- FIGS. 3 to 5 are diagrams for explaining an example of dividing a texture image.
- the image division processing unit 21 displays the image 401 as shown in FIG. 4.
- Divide into segments In the image 301, the left and right hairs of the girl's head division are drawn in two colors, brown and light brown, and the image division processing unit 21 uses pixels of similar colors such as brown and light brown.
- the closed region is defined as one segment (FIG. 4).
- the skin portion of the girl's face is also drawn in two colors, the skin color and the pink color of the cheek portion, but the image division processing unit 21 separates the skin color region and the pink region from each other. It is defined as a segment (Fig. 4). This is because the skin color and the pink color are not similar (that is, the difference between the skin color pixel value and the pink pixel value exceeds a predetermined threshold value).
- the closed region drawn by the same pattern indicates one segment.
- FIG. 6 shows a distance image 601 corresponding to the image 301 (texture image) in FIG. As shown in FIG. 6, the distance image is an image having a different distance value for each segment.
- the distance image division processing unit 22 When the distance image (frame image) # 2 and the segment information # 21 that are each frame image of the distance video are input, the distance image division processing unit 22 performs the distance image # 2 for each segment in the texture image # 1 ′. A distance value set composed of distance values of each pixel included in the corresponding segment (region) in the center is extracted. Then, the distance image division processing unit 22 generates segment information # 22 in which the distance value set and the position information are associated with each segment from the segment information # 21. Then, the generated segment information # 22 is output to the distance value correction unit 23.
- the distance image division processing unit 22 refers to the input segment information # 21, identifies the position of each segment in the texture image # 1 ′, and is the same as the segment division pattern in the texture image # 1 ′. In this division pattern, the distance image # 2 is divided into a plurality of segments. Therefore, the segment division pattern in the texture image # 1 ′ and the segment division pattern in the distance image # 2 are the same.
- the distance value correction unit 23 calculates the mode value as the representative value # 23a from the distance value set of the segment included in the segment information # 22 for each segment of the distance image # 2. That is, when the segment i in the distance image # 2 includes N pixels, the distance value correcting unit 23 calculates the mode value from the N distance values.
- the distance value correcting unit 23 may calculate an average of N distance values as an average value, or a median value of N distance values or the like as a representative value # 23a instead of the mode value. In addition, when the average value or the median value becomes a decimal value as a result of the calculation, the distance value correcting unit 23 may round the decimal value to an integer value by rounding down, rounding up, or rounding.
- the distance value correcting unit 23 replaces the distance value set of each segment included in the segment information # 22 with the representative value # 23a of the corresponding segment, and outputs it to the number assigning unit 24 as the segment information # 23.
- the reason for calculating the representative value # 23a is as follows. Ideally, all the pixels included in each segment in the distance image # 2 have the same distance value. However, pixels having different distance values in the same segment of the distance image # 2 when the edges of the texture image # 1 and the distance image # 2 are shifted due to, for example, inaccuracy of the distance image # 2. Groups may exist. In such a case, for example, if the distance value of the pixel group having a small pixel value is replaced with the distance value of the pixel group having the maximum pixel value, the distance values in the same segment are all the same, and the distance image # 2 is segmented. It can be rolled into a shape.
- the above processing also has the effect of improving the accuracy of the edge portion of the distance image # 2. Will have.
- the number assigning unit 24 associates identifiers having different values with each representative value # 23a included in the segment information # 23. Specifically, the number assigning unit 24 sets the segment number # 24 according to the representative value # 23a and the position information for each set of the position information and the representative value # 23a of the M sets included in the segment information # 23. Associate. Then, the number assigning unit 24 outputs the data in which the segment number # 24 and the representative value # 23a are associated to the distance value encoding unit 25.
- a segment is included in the same segment when the distance values of pixels connected in the vertical or horizontal direction are the same, but even if there are pixels with the same distance value in the diagonal direction, Are not considered to be included in the same segment. That is, the segment is formed by a group of pixels having the same distance value connected in the vertical or horizontal direction.
- FIGS. 7 to 9 are diagrams for explaining pixels included in a segment.
- FIGS. 10 to 12 are diagrams for explaining a method of assigning segment number # 24.
- segment number # 24 does not have to overlap in the same image (frame) of the same video, pixels are scanned line by line from the upper left to the lower right of the image (FIG. 10, raster scan). When the number to the segment including the target pixel is not assigned, it is conceivable to assign the number in order from 0.
- segment number # 24 is assigned to the image 501 in FIG. 5 by raster scan
- segment number “0” is assigned to the segment R0 positioned at the head in the raster scan order as shown in FIG.
- segment number “1” is assigned to the segment R1 that is positioned second in the raster scan order.
- segment numbers “2” and “3” are assigned to the third and fourth segments R2 and R3, respectively, in the raster scan order.
- the distance value encoding unit 25 performs compression encoding processing on the data (segment table 1201) in which the segment number # 24 and the representative value # 23a are associated, and the obtained encoded data (image encoded data) # 25. And reference compression (encoding information) # 25A and reference picture information (image specifying information) # 25B for static compression encoding are output to the packaging unit 28.
- the distance value is expressed in 256 stages, and the data in which the segment number # 24 and the representative value # 23a are associated with each other is as shown in FIG. It can be expressed as a numerical sequence 1301.
- this number sequence is encoded by a hybrid method of adaptive compression coding and static compression coding.
- the hybrid method is a method of performing compression encoding using a preferable encoding method among adaptive compression encoding and static compression encoding.
- Adaptive compression coding is a process of compression that creates a correspondence table (codebook for adaptation) between codewords and pre-coding values (sequence pattern), and adaptively updates the codebook for adaptation. This is a coding method that goes on. This is a suitable method when the appearance rate of each value before encoding is not known. However, the compression rate is low compared to static compression coding.
- static compression encoding refers to encoding in which the number of bits of a code word is made different based on the appearance rate when the appearance rate of each value before encoding is known.
- the appearance rate of each value before encoding is required, so that a sequence having a known appearance rate can be encoded with a high compression rate.
- first calculate the appearance rate of each value by scanning the sequence once until the end, and counting the frequency of each value, in order to obtain the appearance rate of each value.
- static compression encoding is performed based on the appearance rate. Therefore, in order to obtain the appearance rate, it is necessary to scan an extra number sequence, and there is a drawback that it takes time for processing. Also, since it is necessary for the decoding side apparatus to perform decoding corresponding to the encoding method, the decoding side apparatus similarly takes time for processing.
- adaptive compression coding adaptive coding
- static compression coding static coding
- the adaptive compression coding method is an adaptive entropy coding method that adaptively updates the codebook (event occurrence probability table) of Huffman coding and arithmetic coding methods classified as entropy coding.
- Various methods have been proposed.
- LZW Lempel-Ziv-Welch
- a LZW coding method for adaptively updating a Lempel-Ziv coding codebook (dictionary), which is a typical example of lexicographic coding, will be described.
- the LZW system is an encoding system developed by Terry Welch as an example of an implementation of the LZ78 encoding system announced by Abraham Lempel and Jacob Ziv in 1978.
- this method paying attention to the pattern in which values are arranged, a newly appearing pattern is sequentially registered in the code book and at the same time a code word is output. On the decoding side, a new pattern is registered in the codebook and decoded based on the received codeword in the same manner as the code side, whereby the original sequence can be completely reproduced. Therefore, this method is a so-called lossless encoding method in which information is not lost by encoding.
- This encoding method is one of the encoding methods having excellent compression efficiency, and is widely used practically in image compression and the like.
- FIG. 14 shows this LZW algorithm.
- the LZW method is an encoding method developed for compressing a character string
- the expression assumes a case where the character string is compressed.
- the character string can be expressed by a binary (bit) sequence of several digits, this algorithm can be applied to the numerical sequence 1301 of the distance value as it is.
- the code book is initialized and all single characters are registered in the code book (S51). For example, if only three letters a, b, and c of the alphabet are used, these three alphabets are registered in the code book, and 0 is assigned to a, 1 is assigned to b, and 2 is assigned to c.
- the first character of the character string to be encoded is read and assigned to ⁇ ( ⁇ is a variable) (S52). Further, the next one character is read and assigned to K (K is a variable) (S53). Then, it is further determined whether or not there is an input character string (S54). If there is no further input character string (NO in S54), the code word corresponding to the character string stored in ⁇ is output and the process ends (S55). On the other hand, if there are more input character strings (YES in S54), it is determined whether or not the character string ⁇ K exists in the code book (S56).
- the LZW algorithm As can be seen from this algorithm, in the LZW method, as the same pattern is included in the character string to be encoded, the pattern portion can be replaced with a single codeword, so that significant compression is possible. It becomes.
- each codeword included in the output codeword string 1601 is converted into a 9-digit binary value as shown in FIG. 17 and output to the packaging unit 28 as a binary string 1701.
- the code word “89” is converted into a binary “001011001”
- the code word “182” is converted into a binary “010110110”, and so on.
- a binary value of 9 digits is used as a value representing each code word.
- the code book becomes larger as the encoding progresses, if the number of code words exceeds 512, 2 digits of 9 digits are used. It cannot be expressed with a value.
- the LZW method has a rule that the size of the codebook increases by 1 at the timing when the code word is output, so the number of digits can be determined on the decoding side. Therefore, if the decoding side counts the number of codewords to be received, it is possible to determine the number of digits at each time point.
- the codebook size increases as the coding continues, so it is necessary to limit the codebook size at some point.
- the codebook maximum size is determined in advance, and when the codebook size reaches the specified size, the codebook is reset to the initial value, or the newest one in order from the pattern with the longest unused period
- LZT method is a method of replacing the Here, it is assumed that the LZT method is used.
- this LZT encoding is applied to the sequence 1301, a plurality of distance values appearing in the same pattern can be expressed by one code word, so that the number of code words is larger than the number of distance values. As a result, the amount of data can be compressed.
- FIG. 18 is a diagram showing the number of appearances of each value of 0 to 255 in a certain distance image. As shown in FIG. 18, each value appears every 6 to 7, and the number of appearances is 0 during that time. Thus, in the distance image, not all values appear, but may appear at intervals.
- the distance images are similar to each other in the temporal relationship.
- the generated codebook for static compression coding may be able to be reused in pictures that are temporally mixed, thereby enabling efficient compression coding. Can be realized.
- the image encoding unit 11 when the first texture image is encoded as an I picture, information that the I picture has been selected (picture information # 11A) and the segment table of the distance image corresponding to the texture image 1201 is input to the distance value encoding unit 25.
- the distance value encoding unit 25 creates a code book 1501 from the segment table 1201 in accordance with the above-described adaptive compression encoding algorithm, and converts each code word of the code word string 1601 into a 9-digit binary value.
- a binary string 1701 is output.
- “0” is set as the reference flag # 25A for static compression encoding, and the binary string 1701 is output to the packaging unit 28. This reference flag is set to “1” when static compression encoding is performed.
- the appearance rate of the static encoding table 1901 is a value obtained by dividing the number of appearances of each value by the total number of segments.
- the code word is a code word when Huffman coding is performed based on the appearance rate. Huffman coding is a well-known technique, and a detailed description thereof is omitted.
- the distance values “0” and “255” have a low appearance rate, so the code word is 10 bits (“1100011110” and “1100001001”).
- the code word is 5 bits (“10011” and “11010”).
- the appearance values of the distance values “1”, “125”, “127”, “128”, “129”, and “131” are 0, no code word is assigned.
- a code book (static code book) 1902 in which the distance value and the code word are associated in the static coding table 1901 is stored until the next picture processing.
- the saving of the code book 1902 is not limited to the processing of the next picture, and the code book 1902 may be saved until the code book 1902 is no longer needed.
- the image encoding unit 11 encodes the second texture image as, for example, a P picture, information indicating that the P picture has been selected, information indicating which picture was referenced, the texture image,
- the corresponding distance image segment table 1201 is input to the distance value encoding unit 25.
- this P picture refers to the previous I picture.
- AVC encoding in the case of a B picture, since up to two reference destinations are permitted, there may be two reference destinations. In this case, information on both reference destinations is input.
- this code book 1902 is a code book created based on the appearance rate of each value in the previous picture, more efficient encoding is possible as the previous picture and the current picture are similar. It becomes. However, since the code book 1902 is created based on the previous picture, there may be a case where a value that is included in the current picture but not included in the code book 1902 exists. In this case, the static encoding method is not performed.
- adaptive compression encoding is performed on the current picture, and the number of encoded data bits after encoding is calculated. Similarly to the first picture, the number of occurrences of each value in the picture is counted, and a static coding table 1901 is created in which each value is associated with the appearance rate of each value and a code word.
- the reference flag is set to “0” as in the case of the first picture, and adaptive compression is performed. Output encoded data encoded by encoding.
- the reference flag is set to “1”, and the reference flag and the picture number of the reference destination (here, 1 Picture information # 25B indicating the previous picture) and encoded data subjected to static compression encoding (Huffman encoding) are output.
- FIG. 20 is a flowchart showing the flow of processing for determining data to be output in the distance value encoding unit 25.
- the distance value encoding unit 25 determines whether or not there is a code table (code book) referred to for static compression encoding of a picture (S81). In this determination, in the case of the first image or IDR picture, since similarity with the previous image cannot be expected, it is assumed that there is no code table to be referred to. In other cases, it is assumed that there is a code table to be referenced.
- code book code book
- adaptive compression coding is performed, the number of occurrences of each value in the picture is counted, and each value is associated with the appearance rate of each value and the code word.
- the attached static encoding table 1901 is created (S82). Then, the code book 1902 is stored in the static encoding table 1901. Further, “0” is set as a reference flag and output (S86).
- the code table for two times immediately before the picture to be encoded can be considered. If the range of the code table to be referenced is limited to the previous one, it is not necessary to transmit the reference picture number to the decoding side.
- adaptive compression encoding is performed, and the number of bits of encoded data after encoding is calculated. Also, the number of occurrences of each value in the picture is counted, and a static encoding table 1901 is created in which each value, the appearance rate of each value, and a code word are associated with each other. Then, the number of bits after encoding is compared between when static compression encoding is performed and when adaptive compression encoding is performed (S83).
- the encoded data subjected to the static compression encoding is output and the reference data is set to “1”.
- the reference flag # 25A and picture information # 25B indicating the reference picture are output (S85).
- step S86 the encoded data after the adaptive compression encoding is output.
- the reference flag is set to “0” and output.
- the above is the process of determining data to be output in the distance value encoding unit 25.
- the packaging unit 28 associates the input encoded data # 11 of the texture image # 1, the encoded data # 25 of the distance image # 2, the reference flag # 25A, and the picture information # 25B as encoded data # 28.
- the video is output to the video decoding device 2.
- the picture information # 25B is not output when the reference flag # 25A is “0”.
- the packaging unit 28 is H.264.
- the texture image encoded data # 11 and the distance image encoded data # 25 are integrated.
- FIG. 21 is a diagram schematically showing the configuration of the NAL unit 1801. As shown in FIG. 21, the NAL unit 1801 is composed of three parts: a NAL header 1802, an RBSP 1803, and an RBSP trailing bit 1804.
- the RBSP 1803 contains encoded data # 11 and encoded data # 25, which are encoded data.
- the RBSP trailing bit 1804 is an adjustment bit for specifying the last bit position of the RBSP 1803.
- the reference flag # 25A and the picture information # 25B extend header information called PPS (Picture Parameter Set) indicating the coding mode of the entire picture, and are stored and transmitted here.
- PPS Picture Parameter Set
- the moving picture encoding apparatus 1 is an H.264 standard.
- the texture image # 1 is encoded using AVC encoding defined in the H.264 / MPEG-4 AVC standard, but the present invention is not limited to this. That is, the image encoding unit 11 of the moving image encoding apparatus 1 may encode the texture image # 1 using another encoding method such as MPEG-2 or MPEG-4.
- FIG. 22 is a flowchart showing the operation of the moving image encoding apparatus 1.
- the operation of the moving image encoding apparatus 1 described here is an operation of encoding a texture image and a distance image of the t frame from the head in a moving image including a large number of frames. That is, the moving image encoding apparatus 1 repeats the operation described below as many times as the number of frames of the moving image in order to encode the entire moving image.
- each data # 1 to # 28 is interpreted as data of the t-th frame.
- the image encoding unit 11 and the distance image division processing unit 22 respectively receive the texture image # 1 and the distance image # 2 from the outside of the moving image encoding device 1 (S1).
- the texture image # 1 and the distance image # 2 received from the outside are correlated with each other in the content of the image, as can be seen, for example, by comparing the texture image of FIG. 3 and the distance image of FIG. is there.
- the image encoding unit 11 The texture image # 1 is encoded by the AVC encoding method stipulated in the H.264 / MPEG-4 AVC standard, and the obtained texture image encoded data # 11 is transmitted to the packaging unit 28 and the image decoding unit 12.
- Output (S2) In step S ⁇ b> 2, the image encoding unit 11 outputs the reference picture to the distance value encoding unit 25 when the selected picture type and the selected picture are a B picture or a P picture.
- the image decoding unit 12 decodes the texture image # 1 ′ from the encoded data # 11 and outputs it to the image division processing unit 21 (S3). Thereafter, the image division processing unit 21 defines a plurality of segments from the input texture image # 1 ′ (S4).
- the image division processing unit 21 generates segment information # 21 including position information of each segment, and outputs it to the distance image division processing unit 22 (S5).
- the position information of the segment for example, each coordinate value of the pixel group located at the boundary with the other segment of the segment can be cited. That is, when each segment is defined from the texture image of FIG. 3, the coordinate value of each coordinate located in the contour portion of the closed region in FIG. 5 becomes the position information of the segment.
- the distance image division processing unit 22 divides the input distance image # 2 into a plurality of segments. Then, the distance image division processing unit 22 extracts a distance value of each pixel included in the segment as a distance value set for each segment of the distance image # 2. Furthermore, the distance image division processing unit 22 associates the distance value set extracted from the corresponding segment with the position information of each segment included in the segment information # 21. Then, the distance image division processing unit 22 outputs the segment information # 22 obtained thereby to the distance value correction unit 23 (S6, image division step).
- the distance value correction unit 23 calculates a representative value # 23a from the distance value set of the segment included in the segment information # 22 for each segment of the distance image # 2. Then, each of the distance value sets included in the segment information # 22 is replaced with the representative value # 23a of the corresponding segment, and is output to the number assigning unit 24 as the segment information # 23 (S7, representative value determining step).
- the number assigning unit 24 associates the representative value # 23a with the segment number # 24 corresponding to the position information for each set of the position information and the representative value # 23a included in the segment information # 23, and sets M sets The representative value # 23a and the segment number # 24 are output to the distance value encoding unit 25 (S8).
- the distance value encoding unit 25 performs encoding processing on the input representative value # 23a and segment number # 24, and outputs the obtained encoded data # 25 to the packaging unit 28 (S9, encoding) Scheme selection step, encoding step).
- the packaging unit 28 integrates the encoded data # 11 output from the image encoding unit 11 in step S2 and the encoded data # 25 output from the distance value encoding unit 25 in step S9.
- the encoded data # 28 is output to the video decoding device 2 (S10).
- the video decoding device 2 decodes the texture image # 1 ′ and the distance image # 2 ′ from the encoded data # 28 transmitted from the above-described video encoding device 1. Then, the decoded texture image # 1 ′ and distance image # 2 ′ are output as frame images to a device constituting the moving image.
- FIG. 23 is a block diagram illustrating a main configuration of the video decoding device 2.
- the moving image decoding apparatus 2 includes an image decoding unit 12, an image division processing unit 21 ′, an unpackaging unit (acquisition unit) 31, a distance value decoding unit (decoding unit, static codebook generation unit). ) 32 and a distance value assigning unit (image generating means) 33.
- the unpackaging unit 31 extracts the encoded data # 11 of the texture image # 1 and the encoded data # 25 of the distance image # 2 from the encoded data # 28.
- the encoded data # 11 of the texture image # 1 is output to the image decoding unit 12, and the encoded data # 25 of the distance image # 2 is output to the distance value decoding unit 32.
- the image decoding unit 12 decodes the texture image # 1 ′ from the encoded data # 11.
- the image decoding unit 12 is the same as the image decoding unit 12 included in the moving image encoding device 1. That is, the image decoding unit 12 is configured to transmit the encoded data # 28 from the moving image encoding apparatus 1 to the moving image decoding apparatus 2 as long as no noise is mixed in the encoded data # 28.
- the texture image # 1 ′ having the same content as the texture image decoded by the image decoding unit 12 is decoded. Then, the decoded texture image # 1 ′ is output.
- the image decoding unit 12 outputs the decoded picture type of the texture image # 1 ′ and the reference picture information to the distance value decoding unit 32.
- the image division processing unit 21 ′ divides the entire area of the texture image # 1 ′ into a plurality of segments (areas) using the same algorithm as the image division processing unit 21 of the moving image encoding device 1. Then, the image division processing unit 21 ′ outputs segment information # 21 ′ including the position information of each segment to the distance value giving unit 33.
- the distance value decoding unit 32 decodes the representative value # 23a and the segment number # 24 (decoded data) from the encoded distance image encoded data # 25, the reference flag # 25A, and the picture information # 25B. Thereby, the sequence 1301 of FIG. 13 encoded by the distance value encoding unit 25 of the moving image encoding apparatus 1 is decoded.
- reference flag # 25A when the reference flag # 25A is “0”, it is encoded data that has been subjected to adaptive compression encoding, so adaptive decoding is performed.
- reference flag # 25A When the reference flag # 25A is “1”, it is encoded data that has been subjected to static compression encoding, and therefore static decoding is performed.
- the reference flag is “0”
- the first picture has been subjected to adaptive compression coding. Therefore, adaptive decoding is performed, and the sequence 1301 is decoded.
- the adaptive encoding is performed in the distance value encoding unit 25 of the moving image encoding apparatus 1
- the number of occurrences of each value of the distance value is counted, and a static encoding table 1901 is created.
- the code book 1902 is stored until the next encoded data is processed. Note that the storage period is not limited to the processing of the next encoded data, and may be until the code book 1902 is not required.
- the moving picture decoding apparatus 2 can create the code book 1902 for static decoding without transmitting it from the moving picture encoding apparatus 1 to the moving picture decoding apparatus 2. Therefore, it is possible to transmit the encoded data or the like with a greatly reduced amount of information.
- segment table 1201 in FIG. 12 is decoded. This segment table 1201 is output to the distance value assigning unit 33.
- the distance value assigning unit 33 Based on the input representative value # 23a and segment number # 24, the distance value assigning unit 33 applies a pixel value (distance value), which is a representative value of the segment, to the pixel included in each segment. Restore 2 '. Then, the restored distance image # 2 ′ is output.
- texture image # 1 and distance image # 2 can be decoded.
- the texture image screen is divided into segments by the following method
- the input texture image is an image of 1024 ⁇ 768 dots
- about several thousand segments for example, 3000 to 5000 segments
- the image division processing unit 21 calculates an average value calculated from the pixel values of the pixel group included in the segment and a segment adjacent to the segment from the input texture image # 1 ′. A plurality of segments whose difference from the average value calculated from the pixel values of the included pixel group is equal to or less than a predetermined threshold value are defined.
- FIG. 28 is a flowchart showing an operation in which the video encoding device 1 defines a plurality of segments based on the above algorithm.
- FIG. 29 is a flowchart showing a subroutine of segment combination processing in the flowchart of FIG.
- the image division processing unit 21 performs one independent segment (provisional segment) for each of all the pixels included in the texture image in the initialization step in FIG. 28 with respect to the texture image subjected to the smoothing process. And the pixel value itself of the corresponding pixel is set as the average value (average color) of all the pixel values in each provisional segment (S41).
- segment combination processing step (S42) the process proceeds to the segment combination processing step (S42), and the provisional segments having similar colors are combined.
- This segment combining process will be described in detail below with reference to FIG. 29, and this combining process is repeated until the combination is not performed.
- the image division processing unit 21 performs the following processing (S51 to S55) for all provisional segments.
- the image division processing unit 21 determines whether or not the height and width of the temporary segment of interest are both equal to or less than a threshold value (S51). If it is determined that both are equal to or lower than the threshold (YES in S51), the process proceeds to step S52. On the other hand, when it is determined that any one is larger than the threshold value (NO in S51), the process of step S51 is performed for the temporary segment to be focused next.
- the temporary segment that should be noted next may be, for example, the temporary segment that is positioned next to the temporary segment that is focused in the raster scan order.
- the image division processing unit 21 selects a temporary segment having an average color closest to the average color of the temporary segment of interest among the temporary segments adjacent to the temporary segment of interest (S52).
- a temporary segment having an average color closest to the average color of the temporary segment of interest among the temporary segments adjacent to the temporary segment of interest (S52).
- an index for judging the closeness of colors for example, the Euclidean distance between vectors when the three RGB values of pixel values are regarded as a three-dimensional vector can be used.
- a pixel value of each segment an average value of all pixel values included in each segment is used.
- the image division processing unit 21 determines whether or not the proximity of the temporary segment of interest and the temporary segment that is determined to have the closest color is equal to or less than a certain threshold ( S53). If it is determined that the value is larger than the threshold value (NO in S53), the process of step S51 is performed for the temporary segment that should be noted next. On the other hand, if it is determined that the value is equal to or less than the threshold (NO in S53), the process proceeds to step S54.
- the image division processing unit 21 converts two provisional segments (provisional segments determined to be closest in color to the provisional segment of interest) into one provisional segment. (S54).
- the number of provisional segments is reduced by 1 by the process of step S54.
- step S54 the average value of the pixel values of all the pixels included in the converted target segment is calculated (S55). If there is a segment that has not yet been subjected to the processing of steps S51 to S55, the processing of step S51 is performed for the temporary segment to be noticed next.
- step S43 After completing the processes of steps S51 to S55 for all the provisional segments, the process proceeds to the process of step S43.
- the image division processing unit 21 compares the number of provisional segments before the process of step S42 with the number of provisional segments after the process of step S42 (S43).
- the process returns to step S42.
- the image division processing unit 21 defines each current temporary segment as one segment.
- the input texture image is an image of 1024 ⁇ 768 dots, it can be divided into about several thousand (for example, 3000 to 5000) segments.
- the segment is used to divide the distance image. Therefore, if the size of the segment becomes too large, various distance values are included in one segment, resulting in a pixel having a large error from the representative value, and the encoding accuracy of the distance image is lowered. Therefore, in the present invention, the process of step S51 is not essential, but it is desirable to prevent the segment size from becoming too large by limiting the segment size as in step S51.
- the number of segments can be made significantly smaller than the number of processing units for orthogonal transformation. Further, since the distance value in each segment is constant, it is not necessary to perform orthogonal transform, and the distance value can be transmitted with 8-bit information. Furthermore, in this embodiment, it is possible to further improve the compression efficiency by performing an adaptive compression encoding method and reusing a code book. Therefore, in this embodiment, the compression efficiency can be greatly improved as compared with the case where the texture video (image) and the distance video (image) are each encoded by the AVC encoding method.
- FIG. 24 is a flowchart showing the operation of the video decoding device 2.
- the operation of the moving image decoding apparatus 2 described here is an operation of decoding a texture image and a distance image of the t-th frame from the top in a three-dimensional moving image including a large number of frames. That is, the moving image decoding apparatus 2 repeats the operation described below as many times as the number of frames of the moving image in order to decode the entire moving image.
- each data # 1 to # 28 is interpreted as data at the t-th frame.
- the unpackaging unit 31 starts from the encoded data # 28 received from the moving image encoding apparatus 1 and encodes the texture image encoded data # 11, the distance image encoded data # 25, the reference flag # 25A, and the picture. Information # 25B is extracted. Then, the unpackaging unit 31 outputs the encoded data # 11 to the image decoding unit 12, and outputs the encoded data # 25, the reference flag # 25A, and the picture information # 25B to the distance value decoding unit 32 (S21, Acquisition step).
- the image decoding unit 12 decodes the texture image # 1 ′ from the input encoded data # 11, and sends it to the image division processing unit 21 ′ and a stereoscopic video display device (not shown) outside the moving image decoding device 2. Output (S22). Further, the image decoding unit 12 outputs the picture information # 11A report indicating the type of the selected picture and the reference picture to the distance value decoding unit 32.
- the image division processing unit 21 ′ defines a plurality of segments using the same algorithm as the image division processing unit 21 of the moving image encoding device 1.
- the image division processing unit 21 ′ replaces the pixel value of each pixel included in each segment with a representative value in the raster scan order in the texture image # 1 ′, so that the segment identification image # 21 ′ is generated.
- the image division processing unit 21 ′ outputs the segment identification image # 21 ′ to the distance value providing unit 33 (S23).
- the distance value decoding unit 32 decodes the binary string 1701 described above from the encoded data # 25 of the distance image, the reference flag # 25A, and the picture information # 25B. Further, the distance value decoding unit 32 decodes the segment number and the representative value # 23a from the binary string 1701. Then, the distance value decoding unit 32 outputs the obtained representative value # 23a and segment number # 24 to the distance value giving unit 33 (S24, decoding step).
- the distance value assigning unit 33 converts the pixel values of all the pixels in the segment identification image # 21 into the representative value # 23a included in the segment based on the input representative value # 23a and the segment number # 24. Thus, the distance image # 2 ′ is decoded. Then, the distance value assigning unit 33 outputs the distance image # 2 ′ to the above-described stereoscopic video display device (S25, image generation step).
- the distance image # 2 ′ decoded by the distance value assigning unit 33 in step S25 is generally the distance image # input to the video encoding device 1.
- the distance image approximates to 2.
- the distance image # 2 is the same as the image obtained by changing the distance value of a very small part included in the segment in the distance image # 2 to the representative value in the segment. It can be said that the distance image # 2 is approximate.
- the moving image transmission system including the moving image encoding device 1 and the moving image decoding device 2 described above also exhibits the above-described effects.
- This embodiment is different from the first embodiment in that there are a plurality of viewpoints of texture images and distance images corresponding to the texture images. That is, the moving image encoding device 1A according to the present embodiment performs the texture image and distance image encoding processing using the same encoding method as the moving image encoding device 1 of the first embodiment. This is different from the moving image encoding apparatus 1 in that a plurality of sets of texture images and distance images are encoded per frame.
- the plurality of sets of texture images and distance images are images of subjects simultaneously captured by cameras and ranging devices installed at a plurality of locations so as to surround the subject. That is, the plurality of sets of texture images and distance images are images for generating a free viewpoint image.
- Each set of texture images and distance images includes camera parameters such as camera position, direction, and focal length as metadata, along with actual data of the texture images and distance images of the set.
- FIG. 25 is a block diagram showing a main configuration of the moving picture encoding apparatus 1A according to the present embodiment.
- the moving image encoding apparatus 1A includes an image encoding unit (MVC encoding unit) 11A, an image decoding unit (MVC decoding unit) 12A, a distance image encoding unit 20A, and a packaging unit 28 ′. It has.
- the distance image encoding unit 20A includes an image division processing unit 21, a distance image division processing unit 22, a distance value correction unit 23, a number assigning unit 24, and a distance value encoding unit (adaptive encoding unit and output unit). 25A.
- the image encoding unit 11A performs the same encoding as the image encoding unit 11 described above, but differs in that it compresses and encodes images from a plurality of viewpoints. Specifically, the image encoding unit 11A performs encoding using MVC (Multiview Video Coding).
- MVC Multiview Video Coding
- AVC used in the first embodiment is a standard for compressing and encoding video (image) from one viewpoint
- MVC is a standard for compressing and encoding multi-view video (image). is there. Therefore, the encoded data # 11 output from the image encoding unit 11A is MVC encoded data.
- MVC coding performs the prediction described in the first embodiment even between viewpoints in order to eliminate redundancy between viewpoints. This will be specifically described with reference to FIG. FIG. 26 is a diagram for explaining MVC encoding.
- an image is predicted in block units from the time direction and the viewpoint direction (space direction).
- the images 2303 and 2305 can be referred to as images in the time direction
- the images 2302 and 2304 can be referred to as images in the viewpoint direction.
- the same method can be used to eliminate the redundancy between images in the time direction and the redundancy between images in the spatial direction.
- a reference destination image is generated in the spatial direction as in the temporal direction.
- Up to two reference images in the spatial direction can be referred to.
- the image decoding unit 12A decodes the texture image # 1 ′ from the encoded data # 11 of the texture image # 1 obtained from the image encoding unit 11A. Then, the texture image # 1 ′ is output to the image division processing unit 21.
- the distance value encoding unit 25A performs compression encoding processing on the data in which the segment number # 24 and the representative value # 23a are associated, and obtains the obtained encoded data # 25. Output to the packaging unit 28 '.
- the packaging unit 28 ' encodes the encoded data # 11 (-1 to -N) of the texture images # 1-1 to # 1-N and the encoded data # 25 (- 1 to -N), reference flag # 25A (-1 to -N), and picture information # 25B (-1 to -N) are integrated to generate encoded data # 28 '. Then, the packaging unit 28 ′ transmits the generated encoded data # 28 ′ to the video decoding device 2A.
- FIG. 27 is a block diagram showing a main configuration of the moving picture decoding apparatus 2A.
- the moving image decoding apparatus 2A includes an image decoding unit 12A, an image division processing unit 21 ′, an unpackaging unit 31 ′, a distance value decoding unit 32A, and a distance value giving unit 33. .
- the unpackaging unit 31 ′ Upon receiving the encoded data 28 ′, the unpackaging unit 31 ′ receives the encoded data # 11 ( ⁇ 1 to ⁇ N), the encoded data # 25 ( ⁇ 1 to ⁇ N), and the reference flag # 25A ( -1 to -N) and picture information # 25B (-1 to -N) are extracted, and the encoded data # 11 is output to the image decoding unit 12, and the encoded data # 25 is output to the distance value decoding unit 32. Is.
- the reference image is determined according to the algorithm shown in FIG. 24, the distance value is decoded by reusing the code book, and the distance image is restored.
- the moving picture decoding apparatus 1 and the moving picture encoding apparatus 2 described above can be used by being mounted on various apparatuses that perform moving picture transmission, reception, recording, and reproduction.
- the moving picture decoding apparatus 1 and the moving picture encoding apparatus 2 described above can be used for transmission and reception of moving pictures.
- FIG. 30 (a) is a block diagram showing a configuration of a transmission apparatus A in which the moving picture encoding apparatus 2 is mounted.
- the transmitting apparatus A encodes a moving image, obtains encoded data, and modulates a carrier wave with the encoded data obtained by the encoding unit A1.
- a modulation unit A2 that obtains a modulation signal by the transmission unit A2 and a transmission unit A3 that transmits the modulation signal obtained by the modulation unit A2.
- the moving image encoding device 2 described above is used as the encoding unit A1.
- the transmission apparatus A has a camera A4 that captures a moving image, a recording medium A5 that records the moving image, and an input terminal A6 for inputting the moving image from the outside as a supply source of the moving image input to the encoding unit A1. May be further provided.
- FIG. 30A illustrates a configuration in which the transmission apparatus A includes all of these, but some of them may be omitted.
- the recording medium A5 may be a recording of a non-encoded moving image, or a recording of a moving image encoded by a recording encoding scheme different from the transmission encoding scheme. It may be a thing. In the latter case, a decoding unit (not shown) for decoding the encoded data read from the recording medium A5 according to the recording encoding method may be interposed between the recording medium A5 and the encoding unit A1.
- FIG. 30B is a block diagram illustrating a configuration of the receiving device B on which the moving image decoding device 1 is mounted.
- the receiving device B includes a receiving unit B1 that receives a modulated signal, a demodulating unit B2 that obtains encoded data by demodulating the modulated signal received by the receiving unit B1, and a demodulating unit.
- a decoding unit B3 that obtains a moving image by decoding the encoded data obtained by B2.
- the moving picture decoding apparatus 1 described above is used as the decoding unit B3.
- the receiving apparatus B has a display B4 for displaying a moving image, a recording medium B5 for recording the moving image, and an output terminal for outputting the moving image as a supply destination of the moving image output from the decoding unit B3.
- B6 may be further provided.
- FIG. 30B illustrates a configuration in which the receiving apparatus B includes all of these, but a part of the configuration may be omitted.
- the recording medium B5 may be for recording an unencoded moving image, or is encoded by a recording encoding method different from the transmission encoding method. May be.
- an encoding unit (not shown) that encodes the moving image acquired from the decoding unit B3 in accordance with the recording encoding method may be interposed between the decoding unit B3 and the recording medium B5.
- the transmission medium for transmitting the modulation signal may be wireless or wired.
- the transmission mode for transmitting the modulated signal may be broadcasting (here, a transmission mode in which the transmission destination is not specified in advance) or communication (here, transmission in which the transmission destination is specified in advance). Refers to the embodiment). That is, the transmission of the modulation signal may be realized by any of wireless broadcasting, wired broadcasting, wireless communication, and wired communication.
- a terrestrial digital broadcast broadcasting station (such as broadcasting equipment) / receiving station (such as a television receiver) is an example of a transmitting device A / receiving device B that transmits and receives a modulated signal by wireless broadcasting.
- a broadcasting station (such as broadcasting equipment) / receiving station (such as a television receiver) for cable television broadcasting is an example of a transmitting device A / receiving device B that transmits and receives a modulated signal by cable broadcasting.
- a server workstation etc.
- Client television receiver, personal computer, smart phone etc.
- VOD Video On Demand
- video sharing service using the Internet is a transmitting device for transmitting and receiving modulated signals by communication.
- a / reception device B usually, either wireless or wired is used as a transmission medium in a LAN, and wired is used as a transmission medium in a WAN.
- the personal computer includes a desktop PC, a laptop PC, and a tablet PC.
- the smartphone also includes a multi-function mobile phone terminal.
- the video sharing service client has a function of encoding a moving image captured by the camera and uploading it to the server. That is, the client of the video sharing service functions as both the transmission device A and the reception device B.
- FIG. 31A is a block diagram showing a configuration of a recording apparatus C equipped with the moving picture decoding apparatus 1 described above.
- the recording device C encodes a moving image to obtain encoded data, and writes the encoded data obtained by the encoding unit C1 to the recording medium M.
- the moving image encoding device 2 described above is used as the encoding unit C1.
- the recording medium M may be of a type built in the recording device C, such as (1) HDD (Hard Disk Drive) or SSD (Solid State Drive), or (2) SD memory. It may be of the type connected to the recording device C, such as a card or USB (Universal Serial Bus) flash memory, or (3) DVD (Digital Versatile Disc) or BD (Blu-ray Disk: registration) (Trademark) or the like may be mounted on a drive device (not shown) built in the recording apparatus C.
- the recording apparatus C receives a moving image as a supply source of the moving image input to the encoding unit C1, a camera C3 that captures the moving image, an input terminal C4 for inputting the moving image from the outside, and the moving image.
- the receiving section C5 may be further provided.
- FIG. 31A illustrates a configuration in which the recording apparatus C includes all of these, but some of them may be omitted.
- the receiving unit C5 may receive an unencoded moving image, or receives encoded data encoded by a transmission encoding method different from the recording encoding method. You may do. In the latter case, a transmission decoding unit (not shown) that decodes encoded data encoded by the transmission encoding method may be interposed between the reception unit C5 and the encoding unit C1.
- Examples of such a recording device C include a DVD recorder, a BD recorder, and an HD (Hard Disk) recorder (in this case, the input terminal C4 or the receiving unit C5 is a main source of moving images).
- a camcorder in this case, the camera C3 is a main source of moving images
- a personal computer in this case, the receiving unit C5 is a main source of moving images
- a smartphone in this case, the camera C3 or The receiving unit C5 is a main source of moving images
- a recording apparatus C is an example of such a recording apparatus C.
- FIG. 31 (b) is a block diagram showing the configuration of the playback device D on which the above-described moving image decoding device 1 is mounted.
- the playback device D obtains a moving image by decoding the read data D1 read by the read unit D1 and the read data read by the read unit D1.
- the moving picture decoding apparatus 1 described above is used as the decoding unit D2.
- the recording medium M may be of a type built in the playback device D such as (1) HDD or SSD, or (2) such as an SD memory card or USB flash memory. It may be of a type connected to the playback device D, or (3) may be loaded into a drive device (not shown) built in the playback device D, such as DVD or BD. Good.
- the playback device D has a display D3 for displaying a moving image, an output terminal D4 for outputting the moving image to the outside, and a transmitting unit for transmitting the moving image as a supply destination of the moving image output by the decoding unit D2.
- D5 may be further provided.
- FIG. 31B illustrates a configuration in which the playback apparatus D includes all of these, but a part of the configuration may be omitted.
- the transmission unit D5 may transmit a non-encoded moving image, or transmits encoded data encoded by a transmission encoding method different from the recording encoding method. You may do. In the latter case, an encoding unit (not shown) that encodes a moving image with a transmission encoding method may be interposed between the decoding unit D2 and the transmission unit D5.
- Examples of such a playback device D include a DVD player, a BD player, and an HDD player (in this case, an output terminal D4 to which a television receiver or the like is connected is a main moving image supply destination).
- a television receiver in this case, the display D3 is a main destination of moving images
- a desktop PC in this case, the output terminal D4 or the transmission unit D5 is a main destination of moving images
- a laptop or tablet PC in this case, the display D3 or the transmission unit D5 is a main destination of moving images
- a smartphone in this case, the display D3 or the transmission unit D5 is a main destination of moving images)
- the moving image coding apparatus 1 includes the distance image division processing unit 22 that divides each frame image of a moving image into a plurality of regions, and the representative of each region divided by the distance image division processing unit 22.
- a distance value encoding unit 25 that selects any one of the static encoding to encode and generates encoded data of the frame image using the selected encoding method.
- the moving image encoding apparatus is a moving image encoding apparatus that encodes a moving image, and an image dividing unit that divides each frame image of the moving image into a plurality of regions;
- a representative value determining means for determining a representative value of each area divided by the image dividing means, and a sequence of representative values determined by the representative value determining means for each frame image, in a predetermined order, as a sequence pattern Adaptive coding for adaptively updating and coding an adaptive codebook associated with a codeword, and for each frame image, the representative values determined by the representative value determining means are arranged in a predetermined order,
- Means and Coding method selection means for selecting either the adaptive coding or the static coding for each frame image, and the coding means selects the code selected by the coding
- control method of the moving image encoding device is a control method of the moving image encoding device for encoding a moving image, and each frame image of the moving image is encoded by the moving image encoding device.
- An image dividing step for dividing the image into a plurality of regions, a representative value determining step for determining a representative value of each region divided in the image dividing step, and a representative value determined in the representative value determining step are arranged in a predetermined order.
- the sequence is adaptively encoded by adaptively updating and encoding an adaptive codebook in which a sequence pattern and a code word are associated, and the representative values determined in the representative value determination step are arranged in a predetermined order, Coding method selection for selecting each of the frame images for each of the frame images, in which static coding is performed by allocating code words having different numbers of bits depending on the appearance rate of the representative value in the frame image. And at least one of the adaptive encoding and the static encoding, and the frame image is encoded using the encoding method selected in the encoding method selection step. And an encoding step for generating encoded data.
- each frame image of a moving image is divided into a plurality of regions, and a representative value of each divided region is determined. Then, for each frame image, the representative positions are arranged in a predetermined order, and it is selected whether the encoding is performed by adaptive encoding or static encoding. For each frame image, encoding is performed using the selected encoding method.
- the predetermined order is an order in which the position corresponding to the representative value can be specified in the frame image.
- the order in which any pixel included in each region is first scanned when the frame image is raster scanned can be set as a predetermined order.
- the encoding method with the smaller amount of information after encoding is selected, more compressed encoded data can be generated. Further, if an encoding method with few processing procedures is selected, more efficient encoding can be performed.
- the appearance rate of each representative value in the frame image to be adaptively encoded is calculated, and each representative value is calculated based on the calculated appearance rate.
- a static codebook creating means for determining the number of bits of a codeword to be assigned to a value and creating a static codebook in which a representative value and a codeword having the determined number of bits are associated with each other;
- the coding means uses the static codebook created by the static codebook creating means when adaptively coding the previous frame image, and
- the frame image may be statically encoded.
- a static codebook created previously can be used when performing static encoding. Therefore, it is not necessary to newly perform a process for creating a static codebook, and the efficiency of the static encoding process can be improved.
- the coding method selection means performs coding in which an amount of information after coding of a frame image to be coded is reduced, between adaptive coding and static coding. A method may be selected.
- encoding is performed using an encoding method in which the information amount of encoded data after encoding is small. Therefore, encoding can be performed with an encoding method having a higher compression rate.
- the static code book creating means creates a static code book every time a frame image is adaptively coded, and the coding method selecting means is static.
- the coding means has the smallest amount of information after coding the frame image to be coded among the plurality of static codebooks created by the static codebook creating means.
- a static codebook may be used to perform static encoding of a frame image to be encoded.
- static coding can be performed using a code book having the smallest information amount of coded data after coding among a plurality of static code books. Therefore, encoding with a higher compression rate can be performed.
- the static code book creating means holds the created static code book, and when the number of held static code books exceeds a predetermined number, the static code book is discarded in the oldest order. You may do.
- the moving image is a moving image of a plurality of viewpoints
- the encoding means is configured such that the static codebook generating means adaptively encodes frame images of different viewpoints.
- the static codebook generating means adaptively encodes frame images of different viewpoints.
- the coding means includes a static codebook to which a codeword corresponding to each representative value of a frame image to be coded is not assigned when performing static coding.
- Other static codebooks may be used.
- encoding can be performed using a static codebook that can be used for static encoding.
- the representative value is a numerical value included in a predetermined range
- the static codebook creating means includes a static code among the numerical values included in the predetermined range.
- a static code book in which codewords are assigned to numerical values different from the representative values in the frame image to be created may be created.
- a static code book is created in which a code word is associated with a numerical value that is not a representative value. Therefore, it can be prevented that static coding cannot be performed because there is no corresponding code word.
- the moving picture decoding apparatus divides each frame image of a moving picture into a plurality of areas, and associates a sequence pattern and a code word with a number sequence in which representative values of each area are arranged in a predetermined order.
- Adaptive coding that adaptively updates and encodes the attached codebook, or the representative values are arranged in a predetermined order, and each representative value has a bit number depending on the appearance rate of the representative value in the frame image.
- a moving image decoding apparatus for decoding image encoded data which is data encoded by any one of static encoding for allocating and encoding different codewords, the image encoded data and the image encoding
- each frame image of the moving image is generated from decoding means for decoding the encoded image data to generate decoded data, decoded data generated by the decoding means, and information indicating the region.
- an image generation means for each data, each frame image of the moving image is generated from decoding means for decoding the encoded image data to generate decoded data, decoded data generated by the decoding means, and information indicating the region.
- an image generation means for each data, each frame image of the moving image is generated from decoding means for decoding the encoded image data to generate decoded data, decoded data
- control method of the moving picture decoding apparatus divides each frame image of a moving picture into a plurality of areas, and a number sequence pattern and codeword for a number sequence in which representative values of each area are arranged in a predetermined order.
- adaptive coding that adaptively updates and encodes the codebook, or the representative values are arranged in a predetermined order, and each representative value is represented by the appearance rate of the representative value in the frame image.
- a method for controlling a moving picture decoding apparatus that decodes picture encoded data that is data encoded by any one of static encodings in which codewords having different bit numbers are allocated and encoded, the moving picture decoding apparatus
- You In the decoding method for each of the image encoded data corresponding to the frame image, a decoding step for decoding the image encoded data to generate decoded data, decoded data generated in the decoding step, and the region And an image generation step of generating each frame image of the moving image from the information.
- each frame image of a moving image is divided into a plurality of regions, and a sequence pattern and a code word are associated with a sequence of numbers in which representative values of each region are arranged in a predetermined order.
- Adaptive coding for adaptively updating and coding a codebook, or a code in which the representative values are arranged in a predetermined order, and the number of bits varies depending on the appearance rate of the representative value in the frame image.
- Image encoded data which is data encoded by any one of static encoding in which words are allocated and encoded, is decoded. Then, an image is generated from the decoded data and the information indicating the area.
- adaptive decoding is performed for the adaptively encoded image encoded data, and static decoding is performed for the statically encoded image encoded data. As described above, decoding can be performed appropriately.
- the decoding means calculates the appearance rate of each representative value from the decoded data generated when adaptive decoding, which is a decoding method corresponding to adaptive encoding, is performed.
- Static codebook creation means for determining the number of bits of a codeword to be assigned to each representative value according to the calculated appearance rate and creating a static codebook in which the representative value and the codeword of the determined number of bits are associated with each other
- the decoding means uses the static codebook created from the decoded data generated when the static codebook creating means adaptively decodes the previous encoded image data.
- the image encoded data may be statically decoded corresponding to the static encoding.
- a static codebook created previously can be used when performing static decoding. Therefore, it is not necessary to newly perform a process for creating a static codebook, and the efficiency of the static decoding process can be improved.
- the acquisition means when the image encoded data is statically encoded, the acquisition means includes a codebook used when the image encoded data is statically encoded.
- the image specifying information indicating the generated frame image is acquired, and the decoding means generates the static codebook generating means generated when the image encoded data of the frame image indicated by the image specifying information is adaptively decoded.
- a static code book may be used to perform static decoding of the statically encoded image encoded data.
- the static codebook creation means holds the created static codebook, and when the number of held static codebooks exceeds a predetermined number, the static codebook creation means discards them in the oldest order. It may be a thing.
- the representative value is a numerical value included in a predetermined range
- the static codebook creating means includes a static codebook among the numerical values included in the predetermined range.
- a static codebook to which codewords are assigned may be created for numerical values not included in the decoded data to be created.
- a static codebook is created in which a numerical value that is not a representative value is associated with the codeword. Therefore, it can be prevented that static coding cannot be performed because there is no corresponding code word.
- the moving image transmission system including the moving image encoding device and the moving image decoding device can achieve the effects described above.
- the moving image encoding device and the moving image decoding device may be realized by a computer.
- the moving image encoding device and the moving image decoding device are operated by causing the computer to operate as the respective means.
- a video encoding device and a video decoding device control program realized by a computer and a computer-readable recording medium on which the control program is recorded also fall within the scope of the present invention.
- each block of the moving image encoding device 1 (1A) and the moving image decoding device 2 (2A), particularly the image encoding unit 11 (11A), the image decoding unit 12, and the distance image encoding unit 20 (20A) Image division processing unit 21 (21 ′), distance image division processing unit 22, distance value correction unit 23, number assigning unit 24, distance value encoding unit 25 (25A)), distance value decoding unit 32, distance value providing unit 33 May be realized in hardware by a logic circuit formed on an integrated circuit (IC chip), or may be realized in software using a CPU (central processing unit).
- the moving image encoding device 1 (1A) and the moving image decoding device 2 (2A) include a CPU that executes instructions of a control program for realizing each function, a ROM (read only memory) that stores the program, A RAM (random access memory) for expanding the program and a storage device (recording medium) such as a memory for storing the program and various data are provided.
- the object of the present invention is to provide program codes (execution format program, intermediate code program, control code) of the video encoding device 1 (1A) and video decoding device 2 (2A) that are software for realizing the functions described above.
- a recording medium in which a source program is recorded so as to be readable by a computer is supplied to the moving image encoding device 1 (1A) and the moving image decoding device 2 (2A), and the computer (or CPU or MPU (microprocessor unit)) ) Can also be achieved by reading and executing the program code recorded on the recording medium.
- the recording medium examples include tapes such as a magnetic tape and a cassette tape, a magnetic disk such as a floppy (registered trademark) disk / hard disk, a CD-ROM (compact disk-read-only memory) / MO (magneto-optical) / Disks including optical disks such as MD (Mini Disc) / DVD (digital versatile disk) / CD-R (CD Recordable), cards such as IC cards (including memory cards) / optical cards, mask ROM / EPROM (erasable Programmable read-only memory) / EEPROM (electrically erasable and programmable read-only memory) / semiconductor memory such as flash ROM, or logic circuits such as PLD (Programmable logic device) and FPGA (Field Programmable Gate Array) be able to.
- tapes such as a magnetic tape and a cassette tape
- a magnetic disk such as a floppy (registered trademark) disk / hard disk
- the moving image encoding device 1 (1A) and the moving image decoding device 2 (2A) may be configured to be connectable to a communication network, and the program code may be supplied via the communication network.
- the communication network is not particularly limited as long as it can transmit the program code.
- the Internet intranet, extranet, LAN (local area network), ISDN (integrated service areas digital network), VAN (value-added network), CATV (community antenna network) communication network, virtual private network (virtual private network), A telephone line network, a mobile communication network, a satellite communication network, etc. can be used.
- the transmission medium constituting the communication network may be any medium that can transmit the program code, and is not limited to a specific configuration or type.
- IEEE institute of electrical and electronic engineers 1394, USB, power line carrier, cable TV line, telephone line, ADSL (asynchronous digital subscriber loop) line, etc. wired such as IrDA (infrared data association) or remote control , Bluetooth (registered trademark), IEEE 802.11 wireless, HDR (high data rate), NFC (Near Field Communication), DLNA (Digital Living Network Alliance), mobile phone network, satellite line, terrestrial digital network, etc.
- IrDA infrared data association
- Bluetooth registered trademark
- IEEE 802.11 wireless high data rate
- NFC Near Field Communication
- DLNA Digital Living Network Alliance
- mobile phone network satellite line, terrestrial digital network, etc.
- the present invention can also be realized in the form of a computer data signal embedded in a carrier wave in which the program code is embodied by electronic transmission.
- the present invention can be suitably applied to a content generation device that generates 3D-compatible content, a content playback device that plays back 3D-compatible content, and the like.
- Video encoding device (video encoding device) 2 Video decoding device (video decoding device) DESCRIPTION OF SYMBOLS 11, 11A Image encoding part 12 Image decoding part 22 Distance image division process part (image division means) 24 Numbering unit (representative value determining means) 25, 25A Distance value encoding unit (encoding method selection means, encoding means, static codebook creation means) 31 Unpacking part (acquisition means) 32 Distance value decoding unit (decoding means, static codebook creation means) 33 Distance value assigning unit (image generating means)
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
A movie image encoding device comprises: a distance image segmentation processing unit (22) that segments each frame image of a movie image into a plurality of regions; a number assignment unit (24) that determines a representative value for each region segmented by the distance image segmentation processing unit (22); and a distance value encoding unit (25) that selects whether to encode a numerical sequence, wherein the representative values determined by the number assignment unit (24) are arranged in a predetermined order, by either adaptive encoding or static encoding, and that generates encoding data for the frame image using the selected encoding method.
Description
本発明は、距離映像を符号化する動画像符号化装置、動画像符号化装置の制御方法、動画像符号化装置制御プログラム、および、これらによって符号化された符号化データを復号する動画像復号装置、動画像復号装置の制御方法、動画像復号装置制御プログラム、並びに、こられを含む動画像伝送システム、記録媒体に関するものである。
The present invention relates to a moving image encoding apparatus that encodes a distance video, a control method of the moving image encoding apparatus, a moving image encoding apparatus control program, and a moving image decoding that decodes encoded data encoded by these The present invention relates to an apparatus, a video decoding device control method, a video decoding device control program, a video transmission system including the same, and a recording medium.
近年、被写体の3次元形状を表現するディスプレイが広まってきている。3次元形状を表現する形式は大きく分けて2つある。1つは、ディスプレイを見るときに、ユーザが専用の眼鏡をかけることによって、表示された被写体を3次元で認識することができる眼鏡方式であり、もう1つは、ユーザが眼鏡をかけることなく、表示された被写体を3次元で認識することができる裸眼方式である。
In recent years, displays that represent the three-dimensional shape of a subject have become widespread. There are roughly two types of formats for expressing a three-dimensional shape. One is a spectacle method in which the user can recognize the displayed subject in three dimensions by wearing dedicated glasses when viewing the display, and the other is that the user does not wear the glasses. In this method, the displayed subject can be recognized in three dimensions.
眼鏡方式の場合、左眼で左眼用画像を右眼で右眼用画像を見ることによって、被写体を3次元で認識することができるので、左右それぞれの眼に対応する2つの画像が必要となる。
In the case of the glasses method, the subject can be recognized in three dimensions by viewing the image for the left eye with the left eye and the image for the right eye with the right eye, so two images corresponding to the left and right eyes are necessary. Become.
一方、裸眼方式の場合、8~9視点の画像が必要となる。よって、8~9視点の画像それぞれを伝送する場合、伝送データ量が膨大となってしまう。そこで、複数視点の画像を伝送するためのさまざまな技術が提案されている。
On the other hand, in the case of the naked eye method, images of 8 to 9 viewpoints are required. Therefore, when transmitting images of 8 to 9 viewpoints, the amount of transmission data becomes enormous. Therefore, various techniques for transmitting images from a plurality of viewpoints have been proposed.
例えば、複数視点の画像それぞれを伝送するのではなく、通常の二次元映像(テクスチャ映像)と、カメラから被写体までの距離を表現する画像である距離映像との2種類の画像を記録して、伝送する方法がある。テクスチャ画像と距離画像とを伝送することにより、伝送先で複数視点の画像を作成することができる(後述する)ので、複数視点の画像それぞれを伝送するよりも伝送データ量を抑えることができる。
For example, instead of transmitting each of the images from a plurality of viewpoints, two types of images are recorded: a normal two-dimensional image (texture image) and a distance image that is an image representing the distance from the camera to the subject. There is a way to transmit. By transmitting the texture image and the distance image, it is possible to create a multi-viewpoint image (described later) at the transmission destination, so that the amount of transmission data can be suppressed as compared to transmitting each multi-viewpoint image.
ここで、距離画像とは、画像内の被写体全てについて、カメラから被写体までの距離を画素ごとに表現した画像である。カメラから被写体までの距離は、カメラ近傍に設置された、距離を測定する装置によって取得することもできるし、2つ以上の多視点からのカメラによって撮影されたテクスチャ映像を解析することによっても取得することができる。
Here, the distance image is an image expressing the distance from the camera to the subject for each pixel for all the subjects in the image. The distance from the camera to the subject can be obtained by a distance measuring device installed in the vicinity of the camera, or by analyzing texture images taken by the camera from two or more viewpoints. can do.
なお、距離画像については、国際標準化機構/国際電機標準会議(ISO/IEC)のワーキンググループであるMoving Picture Experts Group(MPEG)により、距離深度(カメラから被写体までの距離)を256段階、すなわち8ビットの輝度値で表現する規格であるMPEG-C part3が定められている。
For the distance image, the Moving Depth Experts Group (MPEG), which is a working group of the International Organization for Standardization / International Electrotechnical Commission (ISO / IEC), sets the distance depth (distance from the camera to the subject) in 256 stages, that is, 8 MPEG-C part3, which is a standard expressed by bit luminance values, is defined.
この規格により、距離画像は8ビットのグレースケールで表現された画像となる。そして、距離画像では、距離が近いほど高い値の輝度を割り当てるため、カメラに近い被写体ほど白くなり、遠くの被写体になるほど黒くなる。
According to this standard, the distance image is an image expressed in 8-bit gray scale. In the distance image, a higher brightness is assigned as the distance is shorter, so that the subject closer to the camera becomes whiter and the subject farther away becomes blacker.
そして、テクスチャ画像と距離画像とがあれば、テクスチャ画像に映る被写体の画素ごとの距離が分かるため、被写体を256段階の三次元形状に復元することができる。また、その形状を他視点の二次元平面に幾何的に投影することができるので、テクスチャ画像を他視点からのテクスチャ画像に変換することが可能となる。
Then, if there are a texture image and a distance image, the distance for each pixel of the subject reflected in the texture image can be known, so that the subject can be restored to a three-dimensional shape in 256 stages. In addition, since the shape can be geometrically projected onto the two-dimensional plane of the other viewpoint, the texture image can be converted into the texture image from the other viewpoint.
ただし、1つの視点からのテクスチャ画像には、画像には表れない被写体の裏側の死角が存在するため、単純に投影変換をすると、投影変換では埋められない空白画素(オクルージョン)が発生してしまう。
However, in a texture image from one viewpoint, there is a blind spot behind the subject that does not appear in the image. Therefore, if projection conversion is simply performed, blank pixels (occlusion) that cannot be filled by projection conversion will occur. .
そこで、複数の視点画像を用いることのより、オクルージョンの発生を防止する。具体的には、視点Aからのテクスチャ画像を仮想視点Bからのテクスチャ画像に投影変換する場合、Aとは別の視点である視点Cからのテクスチャ画像からも、同様に仮想視点Bからのテクスチャ画像に投影変換する。これにより、同じ仮想視点Bからの画像が2つでき、視点Aからのテクスチャ画像と視点Cからのテクスチャ画像とでは死角が異なるため、一方の視点からの画像におけるオクルージョンを、他方の視点からの画像により補うことができる。
Therefore, the occurrence of occlusion is prevented by using a plurality of viewpoint images. Specifically, when the texture image from the viewpoint A is projected and converted to the texture image from the virtual viewpoint B, the texture from the virtual viewpoint B is similarly applied from the texture image from the viewpoint C, which is a viewpoint different from A. Project to an image. As a result, two images from the same virtual viewpoint B can be created, and the blind image is different between the texture image from the viewpoint A and the texture image from the viewpoint C. Therefore, the occlusion in the image from one viewpoint is reduced from the other viewpoint. It can be supplemented by images.
なお、一般的に、視点Aと視点Cとを結ぶ線上の仮想視点への投影変換については、オクルージョンを補うことが可能であり、当該線上の仮想視点からの画像を作成することができる。
In general, occlusion can be supplemented for the projection conversion to the virtual viewpoint on the line connecting the viewpoint A and the viewpoint C, and an image from the virtual viewpoint on the line can be created.
そして、この手法を用いることにより、例えば2視点あるいは3視点のテクスチャ画像とそれぞれに対応する距離画像とから、8~9視点の仮想視点からのテクスチャ画像を作成することが可能となる。よって、2視点あるいは3視点のテクスチャ画像とそれぞれに対応する距離画像とを伝送すれば、受信側で8~9視点の仮想視点からのテクスチャ画像を作成することが可能となり、伝送データ量を少なくすることができる。
By using this technique, it is possible to create texture images from 8 to 9 viewpoints from, for example, 2 viewpoints or 3 viewpoints texture images and corresponding distance images. Therefore, by transmitting 2-viewpoint or 3-viewpoint texture images and corresponding distance images, it is possible to create texture images from 8 to 9 virtual viewpoints on the receiving side, reducing the amount of transmitted data. can do.
また、非特許文献1には、複数の視点間の映像(画像を各フレームとする映像)の冗長性を効率良く排除することにより、複数の視点の映像を圧縮する方法が開示されている。これを、複数のテクスチャ映像と、複数の距離映像との2つのグループに対して適用することにより、テクスチャ映像同士での冗長性の排除と、距離映像同士での冗長性の排除が可能となり、伝送データを圧縮することができる。
Further, Non-Patent Document 1 discloses a method of compressing a video from a plurality of viewpoints by efficiently eliminating the redundancy of a video between a plurality of viewpoints (video having an image as each frame). By applying this to two groups of multiple texture images and multiple distance images, it becomes possible to eliminate redundancy between texture images and between distance images. Transmission data can be compressed.
また、特許文献1には、ある視点における距離映像に対し、上述した投影変換を行うことにより、特定視点における距離映像を作成し、作成した距離映像におけるホールを除去することによって、特定視点における距離映像を生成する方法が開示されている。
Further, in Patent Document 1, a distance video at a specific viewpoint is created by performing the above-described projection conversion on a distance video at a certain viewpoint, thereby creating a distance video at a specific viewpoint and removing holes in the created distance video. A method for generating a video is disclosed.
まず、距離画像の特性について説明する。距離画像は、カメラから被写体までの距離を、画素ごとに段階的に離散値で表現したものであり、次のような特徴を持っている。第1の特徴は、被写体のエッジ部分がテクスチャ画像と共通しているということである。すなわち、テクスチャ画像に、被写体と背景とが画像として区別できる情報が含まれている限りにおいて、被写体と背景との境界(エッジ)は、テクスチャ画像と距離画像とで共通である。よって、被写体のエッジ情報は、テクスチャ画像と距離画像との相関情報の大きな要素の1つとなる。
First, the characteristics of the distance image will be described. The distance image represents the distance from the camera to the subject as discrete values step by step for each pixel, and has the following characteristics. The first feature is that the edge portion of the subject is in common with the texture image. That is, as long as the texture image includes information that can distinguish the subject and the background as an image, the boundary (edge) between the subject and the background is common to the texture image and the distance image. Therefore, the edge information of the subject is one of the large elements of the correlation information between the texture image and the distance image.
また、第2の特徴は、被写体のエッジより内側の部分は距離深度値が比較的平坦であるということである。
The second feature is that the distance depth value is relatively flat in the portion inside the edge of the subject.
例えば、被写体が人物の場合、テクスチャ画像では、当該人物が着ている服の模様の情報などが現れるが、距離画像には、服の模様の情報などは現れず、距離深度の情報のみが表現される。そのため、同一被写体上の距離深度値は平坦か、あるいはテクスチャ画像と比較して緩やかな変化となる。
For example, if the subject is a person, the texture image shows information about the clothes worn by the person, but the distance image does not show the clothes pattern information, but only the depth information. Is done. For this reason, the distance depth value on the same subject is flat or changes more slowly than the texture image.
このような2つの特徴により、距離深度値が一定の範囲ごとに画素を区切れば、その範囲内は距離深度値が一定であるため、直交変換などを行うことなく非常に効率的な符号化を行うことが可能となる。さらに、区切り方について、テクスチャ画像における何らかの法則に基づいてより区切る範囲を決定すれば、区切った範囲に関する情報を伝送する必要がなくなり、さらに符号化効率を向上させることができる。
Due to these two features, if the pixels are divided for each range where the distance / depth value is constant, the distance / depth value is constant within that range, so very efficient coding is performed without performing orthogonal transformation or the like. Can be performed. Furthermore, if the range to be divided is determined based on some rule in the texture image, it is not necessary to transmit information regarding the divided range, and the coding efficiency can be further improved.
ここで、距離深度値に基づいて区切った範囲に含まれる画素群をセグメントと呼ぶ。セグメントの個数が少ないほど符号化効率を向上させることができるため、セグメントの形状は限定せず、柔軟な形状としたほうが、より符号化効率を向上させることができる。
Here, the pixel group included in the range divided based on the distance depth value is called a segment. Since the coding efficiency can be improved as the number of segments is smaller, the shape of the segment is not limited, and the coding efficiency can be further improved by using a flexible shape.
そこで、様々な形状のセグメントによって距離画像を分割する場合を考える。各セグメントに対応する距離深度値の分布は、時間的に前後するフレーム間、あるいは異なる視点画像と対応する距離画像と類似したものになる場合が多い。
Therefore, let us consider a case where a range image is divided by segments of various shapes. In many cases, the distribution of distance depth values corresponding to each segment is similar to a distance image corresponding to a different viewpoint image between frames that change in time.
したがって、このような特性を利用し、時間的に前後するフレーム間、あるいは異なる視点画像に対応する距離画像との間で、冗長性を排除すれば、さらなる圧縮が可能となる。
Therefore, further compression is possible if such characteristics are used to eliminate redundancy between frames that change in time or between distance images corresponding to different viewpoint images.
そして、非特許文献1では、テクスチャ画像について、時間的に前後するフレーム間、あるいは異なる視点画像間で、正方形のセグメントを再利用して、圧縮の効率化を図っている。より詳細には、非特許文献1では、動き補償ベクトルあるいは視差補償ベクトルを用いることにより、画像を正方形のセグメント(ブロック)に分割して、時間的に前後するフレーム間、あるいは異なる視点画像間でブロックを再利用することによってデータを圧縮している。
In Non-Patent Document 1, the compression of the texture image is promoted by reusing square segments between temporally adjacent frames or between different viewpoint images. More specifically, in Non-Patent Document 1, by using a motion compensation vector or a parallax compensation vector, an image is divided into square segments (blocks), and between temporally adjacent frames or between different viewpoint images. Data is compressed by reusing blocks.
しかしながら、非特許文献1に記載された方法を、上述した柔軟なセグメント形状に分割した距離画像に適用すると、極めて符号化効率が悪くなってしまう。なぜなら、特許文献1に記載された方法は、セグメントを正方形とし、各セグメントを直交変換するという方法に適した手法であり、セグメントを柔軟な形状にした場合、伝送するベクトル情報が膨大になってしまうためである。
However, when the method described in Non-Patent Document 1 is applied to the distance image divided into the flexible segment shapes described above, the encoding efficiency is extremely deteriorated. This is because the method described in Patent Document 1 is a method suitable for a method in which segments are square and each segment is orthogonally transformed. When the segments are made flexible, the vector information to be transmitted becomes enormous. It is because it ends.
また、特許文献1に記載された方法は、複数視点の距離映像を一つの視点で代用するため、誤差が大きくなり、品質が大幅に劣化してしまう。
In addition, since the method described in Patent Document 1 substitutes a single viewpoint for a distance image of a plurality of viewpoints, an error becomes large and the quality is greatly deteriorated.
以上のように、符号化方式を限定してしまうと、それぞれの符号化方式のメリット、デメリットにより必ずしも好ましい符号化が行えるとは限らない。
As described above, if the encoding method is limited, preferable encoding cannot always be performed due to the advantages and disadvantages of each encoding method.
本発明は、上記の問題点に鑑みてなされたものであり、その目的は、符号化方式を選択できる動画像符号化装置等を実現することにある。
The present invention has been made in view of the above problems, and an object of the present invention is to realize a moving picture encoding apparatus and the like that can select an encoding method.
上記課題を解決するために、本発明に係る動画像符号化装置は、動画像を符号化する動画像符号化装置であって、上記動画像の各フレーム画像を複数の領域に分割する画像分割手段と、上記画像分割手段が分割した各領域の代表値を決定する代表値決定手段と、上記フレーム画像ごとに、上記代表値決定手段が決定した代表値を所定の順序で並べた数列を、数列パターンと符号語とを対応付けた適応用コードブックを適応的に更新して符号化する適応的符号化と、上記フレーム画像ごとに、上記代表値決定手段が決定した代表値を所定に順序で並べ、各代表値に、上記フレーム画像における該代表値の出現率によってビット数が異なる符号語を割り当てて符号化する静的符号化との、少なくとも何れか一方を行い、符号化データを生成する符号化手段と、上記フレーム画像ごとに、上記適応的符号化と上記静的符号化との何れかを選択する符号化方式選択手段と、を備え、上記符号化手段は、上記符号化方式選択手段が選択した符号化方式を用いて上記フレーム画像の符号化を行い、符号化データを生成することを特徴としている。
In order to solve the above-described problem, a moving image encoding device according to the present invention is a moving image encoding device that encodes a moving image, in which each frame image of the moving image is divided into a plurality of regions. Means, a representative value determining means for determining a representative value of each area divided by the image dividing means, and a number sequence in which the representative values determined by the representative value determining means are arranged in a predetermined order for each frame image, Adaptive coding for adaptively updating and coding an adaptive codebook in which a sequence pattern and a code word are associated, and the representative values determined by the representative value determining means for each frame image in a predetermined order The coded data is generated by performing at least one of the static coding in which codewords having different numbers of bits are assigned to each representative value according to the appearance rate of the representative value in the frame image and coded. Do Encoding means; and an encoding method selection means for selecting either the adaptive encoding or the static encoding for each frame image, wherein the encoding means selects the encoding method The frame image is encoded using the encoding method selected by the means to generate encoded data.
また、本発明に係る動画像符号化装置の制御方法は、動画像を符号化する動画像符号化装置の制御方法であって、上記動画像符号化装置にて、上記動画像の各フレーム画像を複数の領域に分割する画像分割ステップと、上記画像分割ステップで分割した各領域の代表値を決定する代表値決定ステップと、上記代表値決定ステップで決定した代表値を所定の順序で並べた数列を、数列パターンと符号語とを対応付けた適応用コードブックを適応的に更新して符号化する適応的符号化と、上記代表値決定ステップで決定した代表値を所定に順序で並べ、各代表値に、上記フレーム画像における該代表値の出現率によってビット数が異なる符号語を割り当てて符号化する静的符号化との、何れかを上記フレーム画像ごとに選択する符号化方式選択ステップと、上記適応的符号化と上記静的符号化との、少なくとも何れか一方を行うものであって、上記符号化方式選択ステップで選択した符号化方式を用いて上記フレーム画像の符号化を行い、符号化データを生成する符号化ステップとを含むことを特徴としている。
Also, the control method of the moving image encoding device according to the present invention is a control method of the moving image encoding device for encoding a moving image, and each frame image of the moving image is encoded by the moving image encoding device. An image dividing step for dividing the image into a plurality of regions, a representative value determining step for determining a representative value of each region divided in the image dividing step, and a representative value determined in the representative value determining step are arranged in a predetermined order. The sequence is adaptively encoded by adaptively updating and encoding an adaptive codebook in which a sequence pattern and a code word are associated, and the representative values determined in the representative value determination step are arranged in a predetermined order, Coding method selection for selecting each of the frame images for each of the frame images, in which static coding is performed by allocating code words having different numbers of bits depending on the appearance rate of the representative value in the frame image. And at least one of the adaptive encoding and the static encoding, and the frame image is encoded using the encoding method selected in the encoding method selection step. And an encoding step for generating encoded data.
上記の構成または方法によれば、動画像の各フレーム画像が複数の領域に分割され、分割された各領域の代表値が決定される。そして、フレーム画像ごとに、代表位置を所定の順序で並べ、これを適応的符号化または静的符号化のいずれで符号化するかを選択する。そして、フレーム画像ごとに、選択した符号化方式で符号化を行う。
According to the above configuration or method, each frame image of a moving image is divided into a plurality of regions, and a representative value of each divided region is determined. Then, for each frame image, the representative positions are arranged in a predetermined order, and it is selected whether the encoding is performed by adaptive encoding or static encoding. For each frame image, encoding is performed using the selected encoding method.
ここで、所定の順序とは、代表値と対応する領域がフレーム画像においてどの位置に存在するかを特定することができる順序である。例えば、フレーム画像をラスタスキャンしたときに各領域に含まれる何れかの画素が最初にスキャンされた順を、所定の順序とすることが挙げられる。
Here, the predetermined order is an order in which the position corresponding to the representative value can be specified in the frame image. For example, the order in which any pixel included in each region is first scanned when the frame image is raster scanned can be set as a predetermined order.
これにより、符号化を行うときに、フレーム画像ごとに、適応的符号化と静的符号化との何れかを選択することができ、フレーム画像ごとに、より好ましい符号化方式で符号化を行うことができる。
As a result, when encoding is performed, either adaptive encoding or static encoding can be selected for each frame image, and encoding is performed with a more preferable encoding method for each frame image. be able to.
例えば、符号化後の情報量が少ない方の符号化方式を選択すれば、より圧縮された符号化データを生成することができる。また、処理手順の少ない符号化方式を選択すれば、より効率が良い符号化を行うことができる。
For example, if the encoding method with the smaller amount of information after encoding is selected, more compressed encoded data can be generated. Further, if an encoding method with few processing procedures is selected, more efficient encoding can be performed.
上記課題を解決するために、本発明に係る動画像復号装置は、動画像の各フレーム画像を複数の領域に分割し、各領域の代表値を所定の順序で並べた数列に対し、数列パターンと符号語とを対応付けたコードブックを適応的に更新して符号化する適応的符号化か、または上記代表値を所定に順序で並べ、各代表値に、上記フレーム画像における該代表値の出現率によってビット数が異なる符号語を割り当てて符号化する静的符号化の何れかで符号化されたデータである画像符号化データを復号する動画像復号装置であって、上記画像符号化データおよび、該画像符号化データの符号化方式を示す情報である符号化情報を取得する取得手段と、上記取得手段が取得した符号化情報が示す符号化方式と対応する復号方式で、上記フレーム画像に対応する上記画像符号化データごとに、該画像符号化データを復号して復号データを生成する復号手段と、上記復号手段が生成した復号データと、上記領域を示す情報とから、上記動画像の各フレーム画像を生成する画像生成手段と、を備えていることを特徴としている。
In order to solve the above-described problem, the moving picture decoding apparatus according to the present invention divides each frame image of a moving picture into a plurality of areas, and a sequence pattern for a number sequence in which representative values of each area are arranged in a predetermined order. Or adaptive coding for adaptively updating and coding a codebook in which codewords are associated with each other or arranging the representative values in a predetermined order, and each representative value is represented by the representative value in the frame image. A moving image decoding apparatus for decoding image encoded data which is data encoded by any one of static encoding in which codewords having different numbers of bits depending on the appearance rate are allocated, and the image encoded data And an acquisition unit that acquires encoding information that is information indicating an encoding method of the encoded image data, and a decoding method corresponding to the encoding method indicated by the encoding information acquired by the acquisition unit, and the frame image In For each corresponding encoded image data, decoding means for decoding the encoded image data to generate decoded data, decoded data generated by the decoding means, and information indicating the region, And an image generating means for generating each frame image.
また、本発明に係る動画像復号装置の制御方法は、動画像の各フレーム画像を複数の領域に分割し、各領域の代表値を所定の順序で並べた数列に対し、数列パターンと符号語とを対応付けたコードブックを適応的に更新して符号化する適応的符号化か、または上記代表値を所定に順序で並べ、各代表値に、上記フレーム画像における該代表値の出現率によってビット数が異なる符号語を割り当てて符号化する静的符号化の何れかで符号化されたデータである画像符号化データを復号する動画像復号装置の制御方法であって、上記動画像復号装置にて、上記画像符号化データおよび、該画像符号化データの符号化方式を示す情報である符号化情報を取得する取得ステップと、上記取得ステップで取得した符号化情報が示す符号化方式と対応する復号方式で、上記フレーム画像に対応する上記画像符号化データごとに、該画像符号化データを復号して復号データを生成する復号ステップと、上記復号ステップで生成した復号データと、上記領域を示す情報とから、上記動画像の各フレーム画像を生成する画像生成ステップと、を含むことを特徴としている。
In addition, the control method of the moving picture decoding apparatus according to the present invention divides each frame image of a moving picture into a plurality of areas, and a number sequence pattern and codeword for a number sequence in which representative values of each area are arranged in a predetermined order. Or adaptive coding that adaptively updates and encodes the codebook, or the representative values are arranged in a predetermined order, and each representative value is represented by the appearance rate of the representative value in the frame image. A method for controlling a moving picture decoding apparatus that decodes picture encoded data that is data encoded by any one of static encodings in which codewords having different bit numbers are allocated and encoded, the moving picture decoding apparatus The acquisition step of acquiring the encoded image data and the encoded information that is information indicating the encoding method of the encoded image data, and the encoding method indicated by the encoded information acquired in the acquiring step You In the decoding method, for each of the image encoded data corresponding to the frame image, a decoding step for decoding the image encoded data to generate decoded data, decoded data generated in the decoding step, and the region And an image generation step of generating each frame image of the moving image from the information.
上記の構成、または方法によれば、動画像の各フレーム画像を複数の領域に分割し、各領域の代表値を所定の順序で並べた数列に対し、数列パターンと符号語とを対応付けたコードブックを適応的に更新して符号化する適応的符号化か、または上記代表値を所定に順序で並べ、各代表値に、上記フレーム画像における該代表値の出現率によってビット数が異なる符号語を割り当てて符号化する静的符号化の何れかで符号化されたデータである画像符号化データを復号する。そして、復号した復号データと、領域を示す情報とから、画像を生成する。
According to the configuration or method described above, each frame image of a moving image is divided into a plurality of regions, and a sequence pattern and a code word are associated with a sequence of numbers in which representative values of each region are arranged in a predetermined order. Adaptive coding for adaptively updating and coding a codebook, or a code in which the representative values are arranged in a predetermined order, and the number of bits varies depending on the appearance rate of the representative value in the frame image. Image encoded data, which is data encoded by any one of static encoding in which words are allocated and encoded, is decoded. Then, an image is generated from the decoded data and the information indicating the area.
これにより、フレーム画像に対応する画像符号化データごとに、適応的符号化された画像符号化データについては適応的復号を行い、静的符号化された画像符号化データについては静的復号を行うというように、適切に復号を行うことができる。
As a result, for each image encoded data corresponding to the frame image, adaptive decoding is performed for the adaptively encoded image encoded data, and static decoding is performed for the statically encoded image encoded data. As described above, decoding can be performed appropriately.
なお、上記動画像符号化装置および動画像復号装置は、コンピュータによって実現してもよく、この場合には、コンピュータを上記各手段として動作させることにより上記動画像符号化装置および動画像復号装置をコンピュータにて実現させる動画像符号化装置および動画像復号装置の制御プログラム、およびそれを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。
Note that the moving image encoding device and the moving image decoding device may be realized by a computer. In this case, the moving image encoding device and the moving image decoding device are operated by causing the computer to operate as the respective means. A video encoding device and a video decoding device control program realized by a computer and a computer-readable recording medium on which the control program is recorded also fall within the scope of the present invention.
以上のように、本発明に係る動画像符号化装置は、動画像の各フレーム画像を複数の領域に分割する画像分割手段と、上記画像分割手段が分割した各領域の代表値を決定する代表値決定手段と、上記フレーム画像ごとに、上記代表値決定手段が決定した代表値を所定の順序で並べた数列を、数列パターンと符号語とを対応付けた適応用コードブックを適応的に更新して符号化する適応的符号化と、上記フレーム画像ごとに、上記代表値決定手段が決定した代表値を所定に順序で並べ、各代表値に、上記フレーム画像における該代表値の出現率によってビット数が異なる符号語を割り当てて符号化する静的符号化との、少なくとも何れか一方を行い、符号化データを生成する符号化手段と、上記フレーム画像ごとに、上記適応的符号化と上記静的符号化との何れかを選択する符号化方式選択手段と、を備え、上記符号化手段は、上記符号化方式選択手段が選択した符号化方式を用いて上記フレーム画像の符号化を行い、符号化データを生成する構成である。
As described above, the moving image encoding apparatus according to the present invention includes an image dividing unit that divides each frame image of a moving image into a plurality of regions, and a representative that determines a representative value of each region divided by the image dividing unit. An adaptive codebook in which a numerical sequence in which the representative values determined by the representative value determining unit are arranged in a predetermined order is associated with a sequence pattern and a codeword is updated adaptively for each frame image. The representative values determined by the representative value determining means are arranged in a predetermined order for each frame image, and each representative value is represented by the appearance rate of the representative value in the frame image. Encoding means for generating encoded data by performing at least one of static encoding that allocates and encodes codewords having different bit numbers, and the adaptive encoding and the above for each frame image Encoding method selection means for selecting any one of the encoding methods, and the encoding means encodes the frame image using the encoding method selected by the encoding method selection means, This is a configuration for generating encoded data.
また、本発明に係る動画像符号化装置の制御方法は、動画像を符号化する動画像符号化装置の制御方法であって、上記動画像符号化装置にて、上記動画像の各フレーム画像を複数の領域に分割する画像分割ステップと、上記画像分割ステップで分割した各領域の代表値を決定する代表値決定ステップと、上記代表値決定ステップで決定した代表値を所定の順序で並べた数列を、数列パターンと符号語とを対応付けた適応用コードブックを適応的に更新して符号化する適応的符号化と、上記代表値決定ステップで決定した代表値を所定に順序で並べ、各代表値に、上記フレーム画像における該代表値の出現率によってビット数が異なる符号語を割り当てて符号化する静的符号化との、何れかを上記フレーム画像ごとに選択する符号化方式選択ステップと、上記適応的符号化と上記静的符号化との、少なくとも何れか一方を行うものであって、上記符号化方式選択ステップで選択した符号化方式を用いて上記フレーム画像の符号化を行い、符号化データを生成する符号化ステップとを含む方法である。
Also, the control method of the moving image encoding device according to the present invention is a control method of the moving image encoding device for encoding a moving image, and each frame image of the moving image is encoded by the moving image encoding device. An image dividing step for dividing the image into a plurality of regions, a representative value determining step for determining a representative value of each region divided in the image dividing step, and a representative value determined in the representative value determining step are arranged in a predetermined order. The sequence is adaptively encoded by adaptively updating and encoding an adaptive codebook in which a sequence pattern and a code word are associated, and the representative values determined in the representative value determination step are arranged in a predetermined order, Coding method selection for selecting each of the frame images for each of the frame images, in which static coding is performed by allocating code words having different numbers of bits depending on the appearance rate of the representative value in the frame image. And at least one of the adaptive encoding and the static encoding, and the frame image is encoded using the encoding method selected in the encoding method selection step. And an encoding step of generating encoded data.
これにより、符号化を行うときに、フレーム画像ごとに、適応的符号化と静的符号化との何れかを選択することができ、フレーム画像ごとに、より好ましい符号化方式で符号化を行うことができるという効果を奏する。
As a result, when encoding is performed, either adaptive encoding or static encoding can be selected for each frame image, and encoding is performed with a more preferable encoding method for each frame image. There is an effect that can be.
また、本発明に係る復号装置は、画像符号化データおよび、該画像符号化データの符号化方式を示す情報である符号化情報を取得する取得手段と、上記取得手段が取得した符号化情報が示す符号化方式と対応する復号方式で、上記フレーム画像に対応する上記画像符号化データごとに、該画像符号化データを復号して復号データを生成する復号手段と、上記復号手段が生成した復号データと、上記領域を示す情報とから、上記動画像の各フレーム画像を生成する画像生成手段と、を備えている構成である。
The decoding apparatus according to the present invention includes image acquisition data, acquisition means for acquiring encoding information that is information indicating an encoding method of the image encoding data, and encoding information acquired by the acquisition means. A decoding unit that decodes the encoded image data to generate decoded data for each of the encoded image data corresponding to the frame image in a decoding method corresponding to the encoding method shown in FIG. An image generating means for generating each frame image of the moving image from the data and information indicating the region is provided.
また、本発明に係る動画像復号装置の制御方法は、動画像復号装置にて、上記画像符号化データおよび、該画像符号化データの符号化方式を示す情報である符号化情報を取得する取得ステップと、上記取得ステップで取得した符号化情報が示す符号化方式と対応する復号方式で、上記フレーム画像に対応する上記画像符号化データごとに、該画像符号化データを復号して復号データを生成する復号ステップと、上記復号ステップで生成した復号データと、上記領域を示す情報とから、上記動画像の各フレーム画像を生成する画像生成ステップと、を含む方法である。
The moving picture decoding apparatus control method according to the present invention provides the moving picture decoding apparatus that obtains the encoded image data and encoded information that is information indicating an encoding method of the encoded image data. A decoding method corresponding to the encoding method indicated by the encoding information acquired in the step and the acquisition step, and decoding the image encoded data by decoding the image encoded data for each of the image encoded data corresponding to the frame image. The method includes: a decoding step to generate; an image generation step to generate each frame image of the moving image from the decoded data generated in the decoding step and information indicating the region.
これにより、フレーム画像に対応する画像符号化データごとに、適応的符号化された画像符号化データについては適応的復号を行い、静的符号化された画像符号化データについては静的復号を行うというように、適切に復号を行うことができるという効果を奏する。
As a result, for each image encoded data corresponding to the frame image, adaptive decoding is performed for the adaptively encoded image encoded data, and static decoding is performed for the statically encoded image encoded data. Thus, there is an effect that decoding can be performed appropriately.
〔実施の形態1〕
本発明の一実施の形態について図1から図24に基づいて説明すれば、以下のとおりである。最初に、本実施形態に係る動画像符号化装置1について説明する。本実施形態に係る動画像符号化装置1は、概略的に言えば、3次元動画像を構成する各フレームについて、該フレームを構成するテクスチャ画像および距離画像(各画素値が奥行き値で表現された画像)を符号化することによって符号化データを生成する装置である。 [Embodiment 1]
One embodiment of the present invention will be described below with reference to FIGS. First, the movingpicture coding apparatus 1 according to the present embodiment will be described. Generally speaking, the moving image encoding apparatus 1 according to the present embodiment roughly describes a texture image and a distance image (each pixel value is expressed by a depth value) constituting each frame of each frame constituting the three-dimensional moving image. This is a device for generating encoded data by encoding (image).
本発明の一実施の形態について図1から図24に基づいて説明すれば、以下のとおりである。最初に、本実施形態に係る動画像符号化装置1について説明する。本実施形態に係る動画像符号化装置1は、概略的に言えば、3次元動画像を構成する各フレームについて、該フレームを構成するテクスチャ画像および距離画像(各画素値が奥行き値で表現された画像)を符号化することによって符号化データを生成する装置である。 [Embodiment 1]
One embodiment of the present invention will be described below with reference to FIGS. First, the moving
本実施形態に係る動画像符号化装置1は、テクスチャ画像の符号化に、H.264/MPEG(Moving Picture Experts Group)-4 AVC(Advanced Video Coding)規格に採用されている符号化技術を用いる一方、距離画像の符号化には本発明に特有の符号化技術を用いている動画像符号化装置である。
The moving image encoding apparatus 1 according to the present embodiment uses H.264 for encoding texture images. H.264 / MPEG (Moving Picture Experts Group) -4 Video coding using the coding technique employed in the AVC (Advanced Video Coding) standard, while using the coding technique peculiar to the present invention to encode distance images An image encoding device.
本発明に特有の上記符号化技術は、テクスチャ画像と距離画像とに相関があることに着目して開発された符号化技術である。2つの画像には、テクスチャ画像中に被写体のエッジを示す情報が含まれている場合、距離画像中の被写体のエッジも同様となるとともに、被写体領域に含まれる画素群は全部または略全ての画素が同じ距離値をとる傾向が強いという相関がある。
The above encoding technique unique to the present invention is an encoding technique developed by paying attention to the fact that there is a correlation between a texture image and a distance image. When the two images include information indicating the edge of the subject in the texture image, the edge of the subject in the distance image is the same, and the pixel group included in the subject area is all or substantially all pixels. Are more likely to take the same distance value.
(画像符号化装置の構成)
最初に本実施形態に係る動画像符号化装置の構成について図1を参照しながら説明する。図1は、動画像符号化装置1の要部構成を示すブロック図である。 (Configuration of image encoding device)
First, the configuration of the video encoding apparatus according to the present embodiment will be described with reference to FIG. FIG. 1 is a block diagram illustrating a main configuration of the movingimage encoding device 1.
最初に本実施形態に係る動画像符号化装置の構成について図1を参照しながら説明する。図1は、動画像符号化装置1の要部構成を示すブロック図である。 (Configuration of image encoding device)
First, the configuration of the video encoding apparatus according to the present embodiment will be described with reference to FIG. FIG. 1 is a block diagram illustrating a main configuration of the moving
図1に示すように、動画像符号化装置1は、画像符号化部(AVC符号化手段)11、画像復号部12、距離画像符号化部20、およびパッケージング部28を含む構成である。また、距離画像符号化部20は、画像分割処理部21、距離画像分割処理部(画像分割手段)22、距離値修正部23、番号付与部(代表値決定手段)24、および距離値符号化部(符号化方式選択手段、符号化手段、静的コードブック作成手段)25を含む構成である。
As shown in FIG. 1, the moving image encoding apparatus 1 includes an image encoding unit (AVC encoding means) 11, an image decoding unit 12, a distance image encoding unit 20, and a packaging unit 28. The distance image encoding unit 20 includes an image division processing unit 21, a distance image division processing unit (image dividing unit) 22, a distance value correcting unit 23, a numbering unit (representative value determining unit) 24, and a distance value encoding. Part (encoding method selection means, encoding means, static codebook creation means) 25.
画像符号化部11は、H.264/MPEG-4 AVC規格に規定されているAVC(Advanced Video Coding)符号化によりテクスチャ画像#1の符号化を行う。そして、符号化データ(AVC符号化データ)#11を画像復号部12およびパッケージング部28に出力する。
The image encoding unit 11 The texture image # 1 is encoded by AVC (Advanced Video Coding) coding defined in the H.264 / MPEG-4 AVC standard. The encoded data (AVC encoded data) # 11 is output to the image decoding unit 12 and the packaging unit 28.
また、画像符号化部11は、選択されたピクチャ(予測画像)の種類(後述する)、および選択されたピクチャの種類と参照先となるピクチャを識別する情報(選択されたピクチャがPピクチャまたはBピクチャの場合)とを示すピクチャ情報#11Aを、距離値符号化部25に出力する。ピクチャ情報#11Aに、選択されたピクチャの種類を示す情報が含まれるのは、本実施の形態では、IDRピクチャを符号化するときに、これ以前のピクチャを符号化するときに用いたコードブックを参照しないようにするためである。なお、IDRピクチャとは、復号側の復号動作をリフレッシュさせるためのピクチャである。
The image encoding unit 11 also selects the type of the selected picture (predicted image) (described later), and the information for identifying the type of the selected picture and the reference picture (the selected picture is a P picture or Picture information # 11A indicating “in the case of a B picture” is output to the distance value encoding unit 25. In this embodiment, the picture information # 11A includes information indicating the type of the selected picture. When the IDR picture is encoded, the codebook used when encoding the previous picture is used. This is for avoiding the reference. The IDR picture is a picture for refreshing the decoding operation on the decoding side.
画像復号部12は、画像符号化部11から取得した、テクスチャ画像#1の符号化データ#11からテクスチャ画像#1´を復号する。そして、テクスチャ画像#1´を画像分割処理部21へ出力する。
The image decoding unit 12 decodes the texture image # 1 ′ from the encoded data # 11 of the texture image # 1 acquired from the image encoding unit 11. Then, the texture image # 1 ′ is output to the image division processing unit 21.
なお、復号されたテクスチャ画像#1´は、動画像符号化装置1からへ送信する時のビット誤りが無い場合、受信側である動画像復号装置2で復号される画像と等しいものとなり、元のテクスチャ画像#1とは異なるものとなる。これは、元のテクスチャ画像#1をAVC符号化する際、正方形状に画素を区切るブロックと呼ばれる単位で施す直交変換係数の量子化誤差が発生するためである。
Note that the decoded texture image # 1 ′ is the same as the image decoded by the moving image decoding device 2 on the receiving side when there is no bit error when transmitting from the moving image encoding device 1. This is different from the texture image # 1. This is because when the original texture image # 1 is AVC-encoded, a quantization error of an orthogonal transform coefficient applied in units called blocks that divide pixels into squares occurs.
また、画像復号部12から画像分割処理部21へのテクスチャ画像#1´の出力は、必ずしもフレーム順に行われるものではない。この理由について、図2を用いて説明する。図2は、AVC符号化において、あるピクチャがどのピクチャを参照するかを説明するための図である。なお、図2の各ピクチャ(201~209)は、この順で映像を構成しており、ピクチャが時刻とともに切り替わることによって映像(動画像)となる。また、ピクチャとは、ある離散時刻における一枚の画像(フレーム画像)のことを意味する。
Further, the output of the texture image # 1 ′ from the image decoding unit 12 to the image division processing unit 21 is not necessarily performed in the frame order. The reason for this will be described with reference to FIG. FIG. 2 is a diagram for explaining which picture a certain picture refers to in AVC coding. Each picture (201 to 209) in FIG. 2 constitutes a video in this order, and a picture (moving image) is obtained by switching the picture with time. A picture means one image (frame image) at a certain discrete time.
AVC符号化は、時間方向の冗長性排除のために、ピクチャの前後で予測を行う。ここで、予測とは、画面を一定の大きさの方形領域(ブロック)ごとに分割し、符号対象となるピクチャの各領域について、時間的に前後する他のピクチャの中の領域から近いものを探し出すことをいう。
AVC coding performs prediction before and after a picture in order to eliminate redundancy in the time direction. Here, the prediction means that the screen is divided into square areas (blocks) of a certain size, and each area of the picture to be coded is the one close to the area in other pictures that are temporally related. To find out.
そして、予測に用いるピクチャの選択方法に応じて、Iピクチャ、Pピクチャ、Bピクチャと種類分けされている。Iピクチャとは、他のピクチャを用いて予測を行わないピクチャのことをいう。また、Pピクチャは時間的に順方向のピクチャのみを予測に用いるピクチャをいう。また、Bピクチャとは、時間的に順方向および逆方向の両方のピクチャを予測に用いるピクチャのことをいう。そして、予測はブロック毎に行われ、例えばBピクチャの場合、参照するピクチャは最大2枚まで指定できる。
And it is classified into I picture, P picture, and B picture according to the selection method of pictures used for prediction. An I picture refers to a picture that is not predicted using another picture. A P picture is a picture that uses only a temporally forward picture for prediction. A B picture is a picture that uses both forward and backward pictures for prediction. Prediction is performed for each block. For example, in the case of a B picture, up to two pictures can be specified.
したがって、Bピクチャは、時間的に後に存在するIピクチャまたはPピクチャ参照する必要があり、参照対象のピクチャが到着した時点で初めて復号が可能となる。例えば、図2のBピクチャ201は、参照対象であるIピクチャ202が到着して初めて復号できる。また、参照対象となるピクチャがIピクチャ202とPピクチャ205となるBピクチャ203は、Pピクチャ205が到着して初めて復号が可能となる。
Therefore, it is necessary to refer to the I picture or P picture that exists later in time for the B picture, and decoding is possible only when the picture to be referenced arrives. For example, the B picture 201 in FIG. 2 can be decoded only after the I picture 202 as the reference object arrives. Also, the B picture 203 whose reference pictures are the I picture 202 and the P picture 205 can be decoded only after the P picture 205 arrives.
よって、画像復号部12は、後のピクチャを参照するBピクチャについては、参照対象のIピクチャまたはPピクチャが到着して初めて、復号処理を行うことができることになるので、場合によっては、時間的に後のピクチャであるIピクチャを先に復号して、画像分割処理部21へ出力するということが起こる。
Therefore, the image decoding unit 12 can perform the decoding process on the B picture that refers to the subsequent picture only after the I picture or the P picture to be referenced arrives. In some cases, an I picture that is a later picture is first decoded and output to the image division processing unit 21.
画像分割処理部21は、テクスチャ画像の全領域を複数のセグメント(領域)に分割する。そして、画像分割処理部21は、各セグメントの位置情報からなるセグメント情報#21を距離画像分割処理部22に出力する。セグメントの位置情報とは、そのセグメントのテクスチャ画像#1における位置を表す情報である。
The image division processing unit 21 divides the entire area of the texture image into a plurality of segments (areas). Then, the image division processing unit 21 outputs segment information # 21 including position information of each segment to the distance image division processing unit 22. The segment position information is information indicating the position of the segment in the texture image # 1.
テクスチャ画像を複数のセグメントに分割する方法としては、例えば、次のような方法が挙げられる。
As a method of dividing a texture image into a plurality of segments, for example, the following method can be cited.
まずテクスチャ画像を、エッジ情報を残しつつ繰り返し平滑化する。これにより、画像の持つノイズを取り除くことができる。その後、隣接するよく似た色のセグメント同士を結合させていく。ただし、幅あるいは高さが所定の画素数を超えるセグメントは、当該セグメント内で距離値が変化する可能性が高くなるため、所定の画素数を超えないように分割する。この方法によって、テクスチャ映像をセグメント単位に分割することができる。
First, the texture image is repeatedly smoothed while leaving edge information. Thereby, the noise which an image has can be removed. Then, adjacent similar colored segments are joined together. However, a segment whose width or height exceeds a predetermined number of pixels is likely to change a distance value within the segment, and is therefore divided so as not to exceed a predetermined number of pixels. By this method, the texture video can be divided into segment units.
この点について、図3~5を用いて、より詳細に説明する。図3~5は、テクスチャ画像を分割する例を説明するための図である。
This point will be described in more detail with reference to FIGS. 3 to 5 are diagrams for explaining an example of dividing a texture image.
例えば、ある離散時刻における画像が、図3に示すような画像301の場合に、画像分割処理部21に画像301が入力されると、画像分割処理部21は、図4の画像401に示すようセグメントに分割する。画像301において、女の子の頭の分け目の左右の髪は、茶色と薄茶色との2色で描かれており、画像分割処理部21は、茶色と薄茶色とのように類似する色の画素からなる閉領域を1つのセグメントに規定している(図4)。また、女の子の顔の肌の部分も、肌色と頬の部分のピンク色との2色で描かれているが、画像分割処理部21は、肌色の領域とピンク色の領域とをそれぞれ別個のセグメントとして規定している(図4)。これは、肌色とピンク色とが類似しない色(すなわち、肌色の画素値とピンク色の画素値との差が所定の閾値を上回る)ためである。なお、画像401において、同一の模様により描かれている閉領域は1つのセグメントを示している。
For example, when the image at a certain discrete time is an image 301 as shown in FIG. 3, when the image 301 is input to the image division processing unit 21, the image division processing unit 21 displays the image 401 as shown in FIG. 4. Divide into segments. In the image 301, the left and right hairs of the girl's head division are drawn in two colors, brown and light brown, and the image division processing unit 21 uses pixels of similar colors such as brown and light brown. The closed region is defined as one segment (FIG. 4). The skin portion of the girl's face is also drawn in two colors, the skin color and the pink color of the cheek portion, but the image division processing unit 21 separates the skin color region and the pink region from each other. It is defined as a segment (Fig. 4). This is because the skin color and the pink color are not similar (that is, the difference between the skin color pixel value and the pink pixel value exceeds a predetermined threshold value). In the image 401, the closed region drawn by the same pattern indicates one segment.
また、ここでは、上述した、幅あるいは高さが所定の画素数を超えることにより、セグメントが小さく分割される処理については省略している。
In addition, here, the process of dividing a segment into small segments when the width or height exceeds a predetermined number of pixels is omitted.
そして、画像401において、セグメント形状のみを抽出すると図5の画像501に示すような情報となる。
Then, when only the segment shape is extracted from the image 401, information as shown in the image 501 in FIG. 5 is obtained.
また、図6に、図3の画像301(テクスチャ画像)に対応する距離画像601を示す。図6に示すように、距離画像は、セグメントごとに異なる距離値を持つ画像となる。
FIG. 6 shows a distance image 601 corresponding to the image 301 (texture image) in FIG. As shown in FIG. 6, the distance image is an image having a different distance value for each segment.
距離画像分割処理部22は、距離映像の各フレーム画像である距離画像(フレーム画像)#2およびセグメント情報#21が入力されると、テクスチャ画像#1´中の各セグメントについて、距離画像#2中の対応するセグメント(領域)に含まれる各画素の距離値からなる距離値セットを抽出する。そして、距離画像分割処理部22は、セグメント情報#21から、各セグメントについて距離値セットと位置情報とが関連づけられたセグメント情報#22を生成する。そして、生成したセグメント情報#22を距離値修正部23に出力する。
When the distance image (frame image) # 2 and the segment information # 21 that are each frame image of the distance video are input, the distance image division processing unit 22 performs the distance image # 2 for each segment in the texture image # 1 ′. A distance value set composed of distance values of each pixel included in the corresponding segment (region) in the center is extracted. Then, the distance image division processing unit 22 generates segment information # 22 in which the distance value set and the position information are associated with each segment from the segment information # 21. Then, the generated segment information # 22 is output to the distance value correction unit 23.
具体的には、距離画像分割処理部22は、入力されたセグメント情報#21を参照して各セグメントのテクスチャ画像#1´における位置を特定し、テクスチャ画像#1´におけるセグメントの分割パターンと同一の分割パターンで、距離画像#2を複数のセグメントに分割する。したがって、テクスチャ画像#1´におけるセグメントの分割パターンと距離画像#2におけるセグメントの分割パターンとは同じとなる。
Specifically, the distance image division processing unit 22 refers to the input segment information # 21, identifies the position of each segment in the texture image # 1 ′, and is the same as the segment division pattern in the texture image # 1 ′. In this division pattern, the distance image # 2 is divided into a plurality of segments. Therefore, the segment division pattern in the texture image # 1 ′ and the segment division pattern in the distance image # 2 are the same.
距離値修正部23は、距離画像#2の各セグメントについて、セグメント情報#22に含まれる該セグメントの距離値セットから代表値#23aとして最頻値を算出する。すなわち、距離値修正部23は、距離画像#2中のセグメントiにN個の画素が含まれている場合には、N個の距離値から最頻値を算出する。なお、距離値修正部23は、最頻値の代わりに、N個の距離値の平均を平均値、または、N個の距離値の中央値等を代表値#23aとして算出してもよい。また、距離値修正部23は、算出の結果、平均値や中央値等の値が小数値になる場合には、切捨て、切り上げ、または四捨五入等により小数値を整数値に丸めてもよい。
The distance value correction unit 23 calculates the mode value as the representative value # 23a from the distance value set of the segment included in the segment information # 22 for each segment of the distance image # 2. That is, when the segment i in the distance image # 2 includes N pixels, the distance value correcting unit 23 calculates the mode value from the N distance values. The distance value correcting unit 23 may calculate an average of N distance values as an average value, or a median value of N distance values or the like as a representative value # 23a instead of the mode value. In addition, when the average value or the median value becomes a decimal value as a result of the calculation, the distance value correcting unit 23 may round the decimal value to an integer value by rounding down, rounding up, or rounding.
そして、距離値修正部23は、セグメント情報#22に含まれる各セグメントの距離値セットを、対応するセグメントの代表値#23aに置き換え、セグメント情報#23として番号付与部24に出力する。
Then, the distance value correcting unit 23 replaces the distance value set of each segment included in the segment information # 22 with the representative value # 23a of the corresponding segment, and outputs it to the number assigning unit 24 as the segment information # 23.
代表値#23aを算出する理由は以下の通りである。距離画像#2中の各セグメントに含まれる画素は全て等しい距離値を持つことが理想的である。しかし、例えば距離画像#2の精度の悪さなどにより、テクスチャ画像#1と距離画像#2とでエッジがずれてしまっている場合に、距離画像#2の同じセグメント内に異なる距離値を持つ画素グループが存在してしまう可能性がある。このような場合に、例えば、画素値が小さい画素グループの距離値を、画素値が最大の画素グループの距離値に置き換えれば、同一セグメント内の距離値は全て同じとなり、距離画像#2がセグメントの形状に丸め込むことができる。
The reason for calculating the representative value # 23a is as follows. Ideally, all the pixels included in each segment in the distance image # 2 have the same distance value. However, pixels having different distance values in the same segment of the distance image # 2 when the edges of the texture image # 1 and the distance image # 2 are shifted due to, for example, inaccuracy of the distance image # 2. Groups may exist. In such a case, for example, if the distance value of the pixel group having a small pixel value is replaced with the distance value of the pixel group having the maximum pixel value, the distance values in the same segment are all the same, and the distance image # 2 is segmented. It can be rolled into a shape.
また、エッジ部分の精度は、距離画像#2よりもテクスチャ画像#1の方が良いことが一般的であるため、上記処理により、距離画像#2のエッジ部分の精度の悪さを改善する効果も有することになる。
Further, since the accuracy of the edge portion is generally better for the texture image # 1 than the distance image # 2, the above processing also has the effect of improving the accuracy of the edge portion of the distance image # 2. Will have.
番号付与部24は、セグメント情報#23が入力されると、セグメント情報#23に含まれている各代表値#23aに、互いに値が異なる識別子を関連づける。具体的には、番号付与部24は、セグメント情報#23に含まれているM組の位置情報および代表値#23aの各組について、代表値#23aと位置情報に応じたセグメント番号#24とを関連づける。そして、番号付与部24は、セグメント番号#24と代表値#23aとが関連づけられたデータを距離値符号化部25に出力する。
When the segment information # 23 is input, the number assigning unit 24 associates identifiers having different values with each representative value # 23a included in the segment information # 23. Specifically, the number assigning unit 24 sets the segment number # 24 according to the representative value # 23a and the position information for each set of the position information and the representative value # 23a of the M sets included in the segment information # 23. Associate. Then, the number assigning unit 24 outputs the data in which the segment number # 24 and the representative value # 23a are associated to the distance value encoding unit 25.
なお、セグメントは、縦または横方向に接続されている画素の距離値が同一の場合、これらの画素は同じセグメントに含まれるが、斜め方向に距離値が同一の画素が存在しても、これらの画素は同じセグメントに含まれるとは見做さない。すなわち、セグメントは、縦または横方向に接続されている同じ距離値の画素群によって形成されている。
A segment is included in the same segment when the distance values of pixels connected in the vertical or horizontal direction are the same, but even if there are pixels with the same distance value in the diagonal direction, Are not considered to be included in the same segment. That is, the segment is formed by a group of pixels having the same distance value connected in the vertical or horizontal direction.
具体的に、図7~9を用いて説明する。図7~9は、セグメントに含まれる画素を説明するための図である。
Specifically, this will be described with reference to FIGS. 7 to 9 are diagrams for explaining pixels included in a segment.
図7および図8に示されている画素Aと画素Bとは、縦または横方向に接続しているので、距離値が同じであれば、同じセグメントに含まれることになる。一方、図9に示されている画素Aと画素Bとは、斜め方向に接続しているので、同じ距離値であっても同じセグメントには含まれない。すなわち、図9の画素Aと画素Bとは、別のセグメントということになる。
7 and 8 are connected to each other in the vertical or horizontal direction, so if the distance values are the same, they are included in the same segment. On the other hand, since the pixel A and the pixel B shown in FIG. 9 are connected in an oblique direction, even the same distance value is not included in the same segment. That is, the pixel A and the pixel B in FIG. 9 are different segments.
次に、セグメント番号#24の付与の方法について、図10~12を用いて説明する。図10~12は、セグメント番号#24の付与の方法を説明するための図である。
Next, a method for assigning segment number # 24 will be described with reference to FIGS. 10 to 12 are diagrams for explaining a method of assigning segment number # 24.
セグメント番号#24は、同一映像の同一画像(フレーム)内で重ならなければよいので、画像の左上から右下に向かって画素を1行ずつ走査していき(図10、ラスタースキャン)、走査対象画素が含まれるセグメントへの番号が割り当てられていない場合に、0から順に番号を割り当てる、という方法で付与することが考えられる。
Since segment number # 24 does not have to overlap in the same image (frame) of the same video, pixels are scanned line by line from the upper left to the lower right of the image (FIG. 10, raster scan). When the number to the segment including the target pixel is not assigned, it is conceivable to assign the number in order from 0.
例えば、図5の画像501に対し、ラスタースキャンによりセグメント番号#24を付与すると、図11に示すように、ラスタースキャン順で先頭に位置するセグメントR0にはセグメント番号「0」が割り当てられる。また、ラスタースキャン順で2番目に位置するセグメントR1にはセグメント番号「1」が割り当てられる。同様に、ラスタースキャン順で3、4番目に位置するセグメントR2、R3には、それぞれ、セグメント番号「2」「3」が割り当てられる。
For example, when segment number # 24 is assigned to the image 501 in FIG. 5 by raster scan, the segment number “0” is assigned to the segment R0 positioned at the head in the raster scan order as shown in FIG. Further, the segment number “1” is assigned to the segment R1 that is positioned second in the raster scan order. Similarly, segment numbers “2” and “3” are assigned to the third and fourth segments R2 and R3, respectively, in the raster scan order.
これにより、図12に示すセグメントテーブル1201のようなデータが得られる。そして、この得られたデータを距離値符号化部25に出力する。
Thereby, data such as the segment table 1201 shown in FIG. 12 is obtained. Then, the obtained data is output to the distance value encoding unit 25.
距離値符号化部25は、セグメント番号#24と代表値#23aとが関連付けられたデータ(セグメントテーブル1201)に圧縮符号化処理を施し、得られた符号化データ(画像符号化データ)#25と静的圧縮符号化の参照フラグ(符号化情報)#25Aと参照ピクチャ情報(画像特定情報)#25Bとをパッケージング部28に出力する。
The distance value encoding unit 25 performs compression encoding processing on the data (segment table 1201) in which the segment number # 24 and the representative value # 23a are associated, and the obtained encoded data (image encoded data) # 25. And reference compression (encoding information) # 25A and reference picture information (image specifying information) # 25B for static compression encoding are output to the packaging unit 28.
より具体的に説明する。距離値は256段階で表現されており、セグメント番号#24と代表値#23aとが関連付けられたデータは、図13に示すような、0から255までの値がセグメント番号順に、セグメント数だけ並んだ数列1301として表現できる。
More specific explanation. The distance value is expressed in 256 stages, and the data in which the segment number # 24 and the representative value # 23a are associated with each other is as shown in FIG. It can be expressed as a numerical sequence 1301.
そして、この数列に対し、適応的圧縮符号化と静的圧縮符号化とのハイブリッド方式により符号化を行う。ハイブリッド方式とは、適応的圧縮符号化と静的圧縮符号化とのうち、好ましい符号化方式を用いて圧縮符号化を行う方式である。
Then, this number sequence is encoded by a hybrid method of adaptive compression coding and static compression coding. The hybrid method is a method of performing compression encoding using a preferable encoding method among adaptive compression encoding and static compression encoding.
適応的圧縮符号化というのは、圧縮の過程で、符号語と符号化前の値(数列パターン)との対応表(適応用コードブック)を作成し、適応用コードブックを適応的に更新していく符号化方式のことをいう。これは、符号化前の各値の出現率が分かっていないときに適した方法である。しかしながら、静的圧縮符号化と比較すると、圧縮率が低くなってしまう。
Adaptive compression coding is a process of compression that creates a correspondence table (codebook for adaptation) between codewords and pre-coding values (sequence pattern), and adaptively updates the codebook for adaptation. This is a coding method that goes on. This is a suitable method when the appearance rate of each value before encoding is not known. However, the compression rate is low compared to static compression coding.
一方、静的圧縮符号化とは、符号化前の各値の出現率が分かっている場合に、この出現率に基づいて符号語のビット数を異ならせる符号化のことをいう。
On the other hand, static compression encoding refers to encoding in which the number of bits of a code word is made different based on the appearance rate when the appearance rate of each value before encoding is known.
静的圧縮符号化を行うためには、符号化前の各値の出現率が必要なため、出現率が分かっている数列に対しては、圧縮率の高い符号化を行うことができる。しかしながら、各値の出現率が分からない数列に対しては、まず各値の出現率を求めるために、一度数列を最後までスキャンして各値の頻度を計数し、各値の出現率を計算する必要がある。そして、出現率に基づいて静的圧縮符号化を行うことになる。よって、出現率を求めるために、数列を余分にスキャンする必要があり、処理に時間がかかるという欠点がある。そして、復号側の装置でも、符号化方式に対応した復号を行う必要があるので、復号側の装置でも同様に処理の時間がかかってしまう。
In order to perform static compression encoding, the appearance rate of each value before encoding is required, so that a sequence having a known appearance rate can be encoded with a high compression rate. However, for sequences that do not know the appearance rate of each value, first calculate the appearance rate of each value by scanning the sequence once until the end, and counting the frequency of each value, in order to obtain the appearance rate of each value. There is a need to. Then, static compression encoding is performed based on the appearance rate. Therefore, in order to obtain the appearance rate, it is necessary to scan an extra number sequence, and there is a drawback that it takes time for processing. Also, since it is necessary for the decoding side apparatus to perform decoding corresponding to the encoding method, the decoding side apparatus similarly takes time for processing.
そこで、本実施の形態では、適応的圧縮符号化(適応的符号化)と静的圧縮符号化(静的符号化)とを切り替えて使用している(ハイブリッド方式)。これにより、圧縮効率および圧縮率の高い符号化を実現することができる。
Therefore, in the present embodiment, adaptive compression coding (adaptive coding) and static compression coding (static coding) are switched and used (hybrid method). As a result, encoding with high compression efficiency and compression rate can be realized.
まず、適応的圧縮符号化の方式には、エントロピー符号化に分類されるハフマン符号化や算術符号化方式のコードブック(事象発生確率表)を適応的に更新していく適応的エントロピー符号化方式など、さまざまな方式が提案されている。ここでは一例として、辞書式符号化の代表例であるLempel-Ziv符号化のコードブック(辞書)を適応的に更新していくLempel-Ziv-Welch(LZW)符号化方式について説明する。LZW方式は、Abraham Lempelと、Jacob Zivが1978年に発表したLZ78符号化方式の実装の一例として、Terry Welchが開発した符号化方式である。これは、値が並ぶパターンに着目し、新たに出現したパターンを逐次コードブックに登録していくと同時に符号語を出力するものである。また、復号側では、受信した符号語を基に、符号側と同じようにしてコードブックに新たなパターンを登録し、復号していくことにより、元の系列が完全に再現できる。よって、この方式は、符号化することによって情報が欠落しない、いわゆるロスレス符号化方式である。この符号化方式は、圧縮効率が優れている符号化方式の1つであり、画像圧縮などにおいて広く実用的に使用されている。
First, the adaptive compression coding method is an adaptive entropy coding method that adaptively updates the codebook (event occurrence probability table) of Huffman coding and arithmetic coding methods classified as entropy coding. Various methods have been proposed. Here, as an example, a Lempel-Ziv-Welch (LZW) coding method for adaptively updating a Lempel-Ziv coding codebook (dictionary), which is a typical example of lexicographic coding, will be described. The LZW system is an encoding system developed by Terry Welch as an example of an implementation of the LZ78 encoding system announced by Abraham Lempel and Jacob Ziv in 1978. In this method, paying attention to the pattern in which values are arranged, a newly appearing pattern is sequentially registered in the code book and at the same time a code word is output. On the decoding side, a new pattern is registered in the codebook and decoded based on the received codeword in the same manner as the code side, whereby the original sequence can be completely reproduced. Therefore, this method is a so-called lossless encoding method in which information is not lost by encoding. This encoding method is one of the encoding methods having excellent compression efficiency, and is widely used practically in image compression and the like.
この、LZW方式のアルゴリズムを図14に示す。ただし、このLZW方式は、文字列を圧縮するために開発された符号化方式であるため、文字列を圧縮する場合を想定した表現となっている。しかしながら、文字列は何桁かの2値(ビット)の列で表現可能なので、このアルゴリズムはそのまま、距離値の数列1301に適用可能である。
FIG. 14 shows this LZW algorithm. However, since the LZW method is an encoding method developed for compressing a character string, the expression assumes a case where the character string is compressed. However, since the character string can be expressed by a binary (bit) sequence of several digits, this algorithm can be applied to the numerical sequence 1301 of the distance value as it is.
まず、コードブックを初期化し、単一の文字を全てコードブックに登録する(S51)。例えば、アルファベットのa、b、およびcの3文字だけを使用する場合であれば、この3つのアルファベットをコードブックに登録し、aに0、bに1、cに2の符号語を割り当てる。
First, the code book is initialized and all single characters are registered in the code book (S51). For example, if only three letters a, b, and c of the alphabet are used, these three alphabets are registered in the code book, and 0 is assigned to a, 1 is assigned to b, and 2 is assigned to c.
次に、符号化対象の文字列の最初の1文字を読み込み、ω(ωは変数)に代入する(S52)。さらに、その次の1文字を読み込み、K(Kは変数)に代入する(S53)。そして、さらに入力文字列があるか否かを判定する(S54)。さらに入力文字列がなければ(S54でNO)、ωに格納されている文字列に対応する符号語を出力して終了する(S55)。一方、さらに入力文字列があれば(S54でYES)、文字列ωKがコードブックの中に存在するかどうかを判定する(S56)。
Next, the first character of the character string to be encoded is read and assigned to ω (ω is a variable) (S52). Further, the next one character is read and assigned to K (K is a variable) (S53). Then, it is further determined whether or not there is an input character string (S54). If there is no further input character string (NO in S54), the code word corresponding to the character string stored in ω is output and the process ends (S55). On the other hand, if there are more input character strings (YES in S54), it is determined whether or not the character string ωK exists in the code book (S56).
そして、文字列ωKがコードブックに存在すれば(S56でYES)、文字列ωKをωに代入し(S57)、ステップS53に戻る。
If the character string ωK exists in the code book (YES in S56), the character string ωK is substituted for ω (S57), and the process returns to step S53.
一方、文字列ωKがコードブックに存在しなければ(S56でNO)、ωに格納されている文字列に対応する符号語を出力し(S58)、文字列ωKをコードブックに登録して(S59)、さらにKをωに代入する(S60)。その後、ステップS53に戻る。
On the other hand, if the character string ωK does not exist in the code book (NO in S56), the code word corresponding to the character string stored in ω is output (S58), and the character string ωK is registered in the code book ( Further, K is substituted for ω (S60). Thereafter, the process returns to step S53.
以上が、LZW方式のアルゴリズムである。このアルゴリズムから分かるように、LZW方式では、符号化しようとする文字列の中に同じパターンが含まれているほど、そのパターン部分を1つの符号語に置き換えることができるので、大幅な圧縮が可能となる。
The above is the LZW algorithm. As can be seen from this algorithm, in the LZW method, as the same pattern is included in the character string to be encoded, the pattern portion can be replaced with a single codeword, so that significant compression is possible. It becomes.
このLZW方式のアルゴリズムにおいて、文字を距離値に置き換え、図13に示した数列1301を符号化することを考える。
Suppose that in this LZW algorithm, characters are replaced with distance values and the sequence 1301 shown in FIG. 13 is encoded.
まず、コードブックを初期化し、0から255までの数値を全てコードブックに登録する。この時点で、符号語の0から255までが埋まる。よって、次の登録は符号語が256から行われる。そして、上述したアルゴリズムにより数列1301を符号化していくと、図15に示すコードブック1501が作成され、図16に示す符号語列1601が出力される。
First, initialize the code book and register all the values from 0 to 255 in the code book. At this point, codewords 0 to 255 are filled. Therefore, the next registration is performed from 256 codewords. Then, when the number sequence 1301 is encoded by the algorithm described above, a code book 1501 shown in FIG. 15 is created, and a code word sequence 1601 shown in FIG. 16 is output.
そして、出力された符号語列1601に含まれる各符号語は、図17に示すような9桁の2値に変換されて、2値列1701としてパッケージング部28に出力される。ここでは、符号語「89」が2値の「001011001」に、符号語「182」が2値の「010110110」にというように変換されている。以下も同様である。なお、ここでは各符号語を表す値として9桁の2値を用いているが、コードブックは、符号化が進むにつれ大きくなっていくため、符号語の数が512を超えると9桁の2値では表現できなくなる。よって、この場合は、符号語の数が512を超えた時点で桁を1つ増やし、10桁の2値で表現する。このようにしても、LZW方式は、符号語が出力されるタイミングでコードブックの大きさが1だけ増えるという規則があるため、復号側でその桁数を判断することが可能である。よって、復号側で、受信する符号語の数を数えておけば、各時点での桁数を判断することが可能である。
Then, each codeword included in the output codeword string 1601 is converted into a 9-digit binary value as shown in FIG. 17 and output to the packaging unit 28 as a binary string 1701. Here, the code word “89” is converted into a binary “001011001”, the code word “182” is converted into a binary “010110110”, and so on. The same applies to the following. Here, a binary value of 9 digits is used as a value representing each code word. However, since the code book becomes larger as the encoding progresses, if the number of code words exceeds 512, 2 digits of 9 digits are used. It cannot be expressed with a value. Therefore, in this case, when the number of codewords exceeds 512, the digit is increased by one and expressed by a binary value of 10 digits. Even in this way, the LZW method has a rule that the size of the codebook increases by 1 at the timing when the code word is output, so the number of digits can be determined on the decoding side. Therefore, if the decoding side counts the number of codewords to be received, it is possible to determine the number of digits at each time point.
また、LZW方式では、符号化を続けていけばいくほど、コードブックの大きさが大きくなっていくため、どこかの時点でコードブックの大きさを限定する必要がある。
Also, in the LZW system, the codebook size increases as the coding continues, so it is necessary to limit the codebook size at some point.
この問題に対しては、さまざまな方式が広く普及している。例えば、コードブックの最大サイズを予め決めておき、コードブックのサイズが決められたサイズに達した時点でコードブックを初期値に戻すという方法や、未使用期間が最も長いパターンから順に、新しいものに置き換えていく方式であるLZT方式等がある。ここでは、LZT方式を用いているものとする。
¡Various methods are widely used for this problem. For example, the codebook maximum size is determined in advance, and when the codebook size reaches the specified size, the codebook is reset to the initial value, or the newest one in order from the pattern with the longest unused period There is an LZT method that is a method of replacing the Here, it is assumed that the LZT method is used.
そして、このLZT方式の符号化を、数列1301に適用すれば、同じパターンで出現する複数の距離値は1つの符号語で表現することができるので、距離値の個数よりも符号語の数が少なくなり、この結果、データ量を圧縮することができる。
If this LZT encoding is applied to the sequence 1301, a plurality of distance values appearing in the same pattern can be expressed by one code word, so that the number of code words is larger than the number of distance values. As a result, the amount of data can be compressed.
ここで、距離画像の特性について考える。図18は、或る距離画像における、0~255の各値の出現数を示す図である。図18に示すように、各値は6から7おきに出現しており、その間は出現数が0である。このように、距離画像では、全ての値が出現する訳ではなく、間隔を置いて出現する場合がある。
Here, let us consider the characteristics of range images. FIG. 18 is a diagram showing the number of appearances of each value of 0 to 255 in a certain distance image. As shown in FIG. 18, each value appears every 6 to 7, and the number of appearances is 0 during that time. Thus, in the distance image, not all values appear, but may appear at intervals.
これは、距離画像の生成方法に起因する。距離画像を、専用の測定デバイスで生成する場合は、図18に示すような形状のグラフにはならず、各値それぞれがある程度出現する。一方、2視点の画像から、視差を計算して距離画像を生成する場合、図18に示すような、各値が6~7おきに出現する形状のグラフとなる。2視点の画像から視差を計算して距離画像を生成する場合、一般に、1/4~1画素単位で2つの視点の画像をずらして、マッチング処理により各画素の距離を推定することによって距離画像を生成する。よって、画像の解像度が低い場合や、マッチング処理のずらし精度が粗い場合、推定する距離が連続的にならない。これにより、図18に示すように、6~7おきに値が出現することとなる。
This is due to the distance image generation method. When the distance image is generated by a dedicated measurement device, the graph does not have a shape as shown in FIG. 18, and each value appears to some extent. On the other hand, when a parallax is calculated from an image of two viewpoints to generate a distance image, a graph having a shape in which each value appears every 6 to 7 as shown in FIG. When generating a distance image by calculating parallax from images of two viewpoints, in general, a distance image is obtained by shifting the images of two viewpoints in units of 1/4 to 1 pixel and estimating the distance of each pixel by matching processing. Is generated. Therefore, when the resolution of the image is low or when the shifting accuracy of the matching process is rough, the estimated distance is not continuous. As a result, as shown in FIG. 18, values appear every 6 to 7.
そして、これらの値の出現の仕方は、1つの映像における各画像(各フレーム画像)については、ほぼ同じである。
And the appearance of these values is almost the same for each image (each frame image) in one video.
また、上述したように、距離画像も、時間的に前後する関係にあるピクチャ同士は類似している。
In addition, as described above, the distance images are similar to each other in the temporal relationship.
よって、生成した、静的圧縮符号化のコードブック(静的コードブック)は、時間的に前後するピクチャで再利用することが可能となる場合があり、これにより、効率のよい圧縮符号化を実現することができる。
Therefore, the generated codebook for static compression coding (static codebook) may be able to be reused in pictures that are temporally mixed, thereby enabling efficient compression coding. Can be realized.
具体的に、適応的圧縮符号化と静的圧縮符号化を切り替える方法について説明する。
Specifically, a method for switching between adaptive compression coding and static compression coding will be described.
まず、画像符号化部11において、1枚目のテクスチャ画像がIピクチャとして符号化された場合、Iピクチャを選択したという情報(ピクチャ情報#11A)と当該テクスチャ画像と対応する距離画像のセグメントテーブル1201が距離値符号化部25に入力される。
First, in the image encoding unit 11, when the first texture image is encoded as an I picture, information that the I picture has been selected (picture information # 11A) and the segment table of the distance image corresponding to the texture image 1201 is input to the distance value encoding unit 25.
そして、距離値符号化部25は、上述した適応的圧縮符号化アルゴリズムに則って、セグメントテーブル1201からコードブック1501を作成し、符号語列1601の各符号語を9桁の2値に変換した2値列1701を出力する。また、静的圧縮符号化のための参照フラグ#25Aとして「0」を設定し、2値列1701とともに、パッケージング部28へ出力する。この参照フラグは、静的圧縮符号化を行った場合に「1」と設定する。
Then, the distance value encoding unit 25 creates a code book 1501 from the segment table 1201 in accordance with the above-described adaptive compression encoding algorithm, and converts each code word of the code word string 1601 into a 9-digit binary value. A binary string 1701 is output. Also, “0” is set as the reference flag # 25A for static compression encoding, and the binary string 1701 is output to the packaging unit 28. This reference flag is set to “1” when static compression encoding is performed.
そして、適応的圧縮符号化を行うときに、このピクチャにおける各値の出現数を計数して、各値と各値の出現率と符号語とを対応付けた、静的符号化テーブル1901(図19)を作成しておく。静的符号化テーブル1901の出現率とは、各値の出現数をセグメント総数で割った値である。また符号語は、その出現率を基にハフマン符号化したときの符号語である。なお、ハフマン符号化は、周知技術であり、その詳細な説明は省略する。
Then, when adaptive compression coding is performed, the number of occurrences of each value in this picture is counted, and a static coding table 1901 (see FIG. 19) is created. The appearance rate of the static encoding table 1901 is a value obtained by dividing the number of appearances of each value by the total number of segments. The code word is a code word when Huffman coding is performed based on the appearance rate. Huffman coding is a well-known technique, and a detailed description thereof is omitted.
図19の静的符号化テーブル1901に示すように、ハフマン符号化では、出現率が高い値ほど、短い符号語を割り当てて、ビットレートを低く抑えている。静的符号化テーブル1901に示す例では、距離値「0」と「255」とは、出現率が低いので、符号語が10ビット(「1100011110」と「1100001001」)となっている。一方、距離値「126」と「130」とは、出現率が高いので、符号語が5ビット(「10011」と「11010」)となっている。また、距離値「1」、「125」、「127」、「128」、「129」、「131」は、出現率が0であるため、符号語が割り当てられていない。
As shown in the static coding table 1901 in FIG. 19, in the Huffman coding, the higher the appearance rate, the shorter the code word is assigned and the bit rate is kept low. In the example shown in the static encoding table 1901, the distance values “0” and “255” have a low appearance rate, so the code word is 10 bits (“1100011110” and “1100001001”). On the other hand, since the appearance values of the distance values “126” and “130” are high, the code word is 5 bits (“10011” and “11010”). Further, since the appearance values of the distance values “1”, “125”, “127”, “128”, “129”, and “131” are 0, no code word is assigned.
そして、静的符号化テーブル1901のうち、距離値と符号語とを対応付けたコードブック(静的コードブック)1902を、次のピクチャの処理まで保存しておく。なお、コードブック1902の保存は、次のピクチャの処理までに限られるものではなく、当該コードブック1902が必要とされなくなるまで保存しておいてもよい。
Then, a code book (static code book) 1902 in which the distance value and the code word are associated in the static coding table 1901 is stored until the next picture processing. Note that the saving of the code book 1902 is not limited to the processing of the next picture, and the code book 1902 may be saved until the code book 1902 is no longer needed.
次に、画像符号化部11が、2枚目のテクスチャ画像を、例えばPピクチャとして符号化した場合、Pピクチャを選択したという情報と、どのピクチャを参照したかという情報と、当該テクスチャ画像と対応する距離画像のセグメントテーブル1201とが、距離値符号化部25に入力される。
Next, when the image encoding unit 11 encodes the second texture image as, for example, a P picture, information indicating that the P picture has been selected, information indicating which picture was referenced, the texture image, The corresponding distance image segment table 1201 is input to the distance value encoding unit 25.
ここでは、このPピクチャが1つ前のIピクチャを参照しているとする。なお、AVC符号化では、Bピクチャの場合、参照先が最大2つまで許可されているので、参照先が2つとなる場合もある。この場合は、その両方の参照先の情報が入力される。
Here, it is assumed that this P picture refers to the previous I picture. In AVC encoding, in the case of a B picture, since up to two reference destinations are permitted, there may be two reference destinations. In this case, information on both reference destinations is input.
そして、保存したコードブック1902を用いて、2値列1701に対しハフマン符号化を実施し、符号化後の符号化データのビット数を算出する。このコードブック1902は1つ前のピクチャにおける、各値の出現率に基づいて作成されたコードブックなので、1つ前のピクチャと現在のピクチャとに類似性があるほど効率のよい符号化が可能となる。ただし、コードブック1902は1つ前のピクチャに基づいて作成されたものなので、現在のピクチャには含まれているが、コードブック1902には含まれていない値が存在するという場合も考えられる。この場合は、静的符号化方式を行わない。
Then, Huffman encoding is performed on the binary string 1701 using the saved code book 1902, and the number of bits of encoded data after encoding is calculated. Since this code book 1902 is a code book created based on the appearance rate of each value in the previous picture, more efficient encoding is possible as the previous picture and the current picture are similar. It becomes. However, since the code book 1902 is created based on the previous picture, there may be a case where a value that is included in the current picture but not included in the code book 1902 exists. In this case, the static encoding method is not performed.
なお、各値の出現数を計数するときに、全ての値に1を加算し、全ての値に対して符号語が割り当てられたコードブックを作成しておき、このコードブックを用いてハフマン符号化を行ってもよい。このコードブックでは、全ての値に対して符号語が割り当てられているので、コードブックで割り当てられていない値が存在することにより符号化が行えないということを防止することができる。
When counting the number of occurrences of each value, 1 is added to all the values, a code book in which codewords are assigned to all the values is created, and the Huffman code is used using this code book. May also be performed. In this codebook, since codewords are assigned to all values, it is possible to prevent encoding from being performed due to the presence of values not assigned in the codebook.
次に、現在のピクチャに対し、適応的圧縮符号化を実施し、符号化後の符号化データビット数を算出する。また、1枚目のピクチャと同様に、ピクチャにおける各値の出現数を計数して、各値と各値の出現率と符号語とを対応付けた、静的符号化テーブル1901を作成する。
Next, adaptive compression encoding is performed on the current picture, and the number of encoded data bits after encoding is calculated. Similarly to the first picture, the number of occurrences of each value in the picture is counted, and a static coding table 1901 is created in which each value is associated with the appearance rate of each value and a code word.
そして、適応的圧縮符号化を実施したときの符号化後の符号化データのビット数と、静的圧縮符号化(ハフマン符号化)を実施したときの符号化後の符号化データのビット数とを比較する。
Then, the number of encoded data bits when the adaptive compression encoding is performed, and the number of encoded data bits after the static compression encoding (Huffman encoding) Compare
その結果、静的圧縮符号化による符号化後の符号化データのビット数の方が大きかった場合、1枚目のピクチャの時と同じように参照フラグを「0」に設定し、適応的圧縮符号化によって符号化された符号化データを出力する。一方、静的圧縮符号化による符号化後の符号化データのビット数の方が小さかった場合、参照フラグを「1」に設定し、この参照フラグと、参照先のピクチャ番号(ここでは、1つ前のピクチャ)を示すピクチャ情報#25Bと、静的圧縮符号化(ハフマン符号化)された符号化データとを出力する。
As a result, when the number of bits of encoded data after encoding by static compression encoding is larger, the reference flag is set to “0” as in the case of the first picture, and adaptive compression is performed. Output encoded data encoded by encoding. On the other hand, when the number of bits of encoded data after encoding by static compression encoding is smaller, the reference flag is set to “1”, and the reference flag and the picture number of the reference destination (here, 1 Picture information # 25B indicating the previous picture) and encoded data subjected to static compression encoding (Huffman encoding) are output.
この動作の流れを図20を用いて説明する。図20は、距離値符号化部25において、出力するデータを決定する処理の流れを示すフローチャートである。
The flow of this operation will be described with reference to FIG. FIG. 20 is a flowchart showing the flow of processing for determining data to be output in the distance value encoding unit 25.
まず、距離値符号化部25は、ピクチャを静的圧縮符号化するために参照する符号表(コードブック)が存在するか否かを判定する(S81)。この判定は、1枚目の画像やIDRピクチャである場合は、それ以前の画像との類似性が期待できないので、参照する符号表はないとする。また、これ以外の場合は、参照する符号表があるとする。
First, the distance value encoding unit 25 determines whether or not there is a code table (code book) referred to for static compression encoding of a picture (S81). In this determination, in the case of the first image or IDR picture, since similarity with the previous image cannot be expected, it is assumed that there is no code table to be referred to. In other cases, it is assumed that there is a code table to be referenced.
そして、参照する符号表がない場合(S81でNO)、適応的圧縮符号化を行うとともに、ピクチャにおける各値の出現数を計数して、各値と各値の出現率と符号語とを対応付けた、静的符号化テーブル1901を作成する(S82)。そして、静的符号化テーブル1901のうち、コードブック1902を保存する。また、参照フラグとして「0」を設定し、出力する(S86)。
If there is no code table to be referenced (NO in S81), adaptive compression coding is performed, the number of occurrences of each value in the picture is counted, and each value is associated with the appearance rate of each value and the code word. The attached static encoding table 1901 is created (S82). Then, the code book 1902 is stored in the static encoding table 1901. Further, “0” is set as a reference flag and output (S86).
一方、参照する符号表が存在する場合(S81でYES)、符号表を用いて、静的圧縮符号化(ハフマン符号化)を実行し、符号化後の符号化データのビット数を算出する。参照できる符号表が複数存在する場合は、それぞれの符号表を用いて静的圧縮符号化を実行し、それぞれの符号化後の符号化データのビット数を算出する。
On the other hand, when there is a code table to be referred to (YES in S81), static compression coding (Huffman coding) is executed using the code table, and the number of bits of the coded data after coding is calculated. When there are a plurality of code tables that can be referred to, static compression encoding is executed using each code table, and the number of bits of encoded data after each encoding is calculated.
参照する符号表の範囲については、例えば、符号化対象ピクチャの直前の2回分の符号表などとすることが考えられる。なお、参照する符号表の範囲を直前の1回分に限定すれば、参照先のピクチャ番号を復号側に伝送する必要はなくなる。
Regarding the range of the code table to be referred to, for example, the code table for two times immediately before the picture to be encoded can be considered. If the range of the code table to be referenced is limited to the previous one, it is not necessary to transmit the reference picture number to the decoding side.
さらに、適応的圧縮符号化を実施し、符号化後の符号化データのビット数を算出する。また、ピクチャにおける各値の出現数を計数して、各値と各値の出現率と符号語とを対応付けた、静的符号化テーブル1901を作成する。そして、静的圧縮符号化を行った場合と適応的圧縮符号化を行った場合との符号化後のビット数を比較する(S83)。
Furthermore, adaptive compression encoding is performed, and the number of bits of encoded data after encoding is calculated. Also, the number of occurrences of each value in the picture is counted, and a static encoding table 1901 is created in which each value, the appearance rate of each value, and a code word are associated with each other. Then, the number of bits after encoding is compared between when static compression encoding is performed and when adaptive compression encoding is performed (S83).
そして、静的圧縮符号化を行った後の符号化データのビット数のほうが小さい場合(S84でYES)、静的圧縮符号化を行った符号化データを出力するとともに、参照データを「1」に設定し、参照フラグ#25Aと参照ピクチャを示すピクチャ情報#25Bとを出力する(S85)。
If the number of bits of the encoded data after performing the static compression encoding is smaller (YES in S84), the encoded data subjected to the static compression encoding is output and the reference data is set to “1”. The reference flag # 25A and picture information # 25B indicating the reference picture are output (S85).
一方、静的圧縮符号化を行った後の符号化データの方がビット数が大きい場合(S84でNO)、ステップS86に進み、適応的圧縮符号化を行った後の符号化データを出力するとともに、参照フラグを「0」に設定して出力する。
On the other hand, if the encoded data after the static compression encoding has a larger number of bits (NO in S84), the process proceeds to step S86, and the encoded data after the adaptive compression encoding is output. At the same time, the reference flag is set to “0” and output.
以上が、距離値符号化部25における、出力するデータを決定する処理である。
The above is the process of determining data to be output in the distance value encoding unit 25.
パッケージング部28は、入力されたテクスチャ画像#1の符号化データ#11、距離画像#2の符号化データ#25、参照フラグ#25A、およびピクチャ情報#25Bを関連づけ、符号化データ#28として動画像復号装置2に出力する。なお、ピクチャ情報#25Bは、参照フラグ#25Aが「0」の場合は出力されない。
The packaging unit 28 associates the input encoded data # 11 of the texture image # 1, the encoded data # 25 of the distance image # 2, the reference flag # 25A, and the picture information # 25B as encoded data # 28. The video is output to the video decoding device 2. The picture information # 25B is not output when the reference flag # 25A is “0”.
具体的には、パッケージング部28は、H.264/MPEG-4 AVC規格で規定されているNALユニットのフォーマットに従って、テクスチャ画像の符号化データ#11と距離画像の符号化データ#25とを統合する。
Specifically, the packaging unit 28 is H.264. In accordance with the format of the NAL unit defined in the H.264 / MPEG-4 AVC standard, the texture image encoded data # 11 and the distance image encoded data # 25 are integrated.
図21はNALユニット1801の構成を模式的に示した図である。図21に示すように、NALユニット1801は、NALヘッダ1802とRBSP1803とRBSPトレイリングビット1804との3つの部分から構成される。
FIG. 21 is a diagram schematically showing the configuration of the NAL unit 1801. As shown in FIG. 21, the NAL unit 1801 is composed of three parts: a NAL header 1802, an RBSP 1803, and an RBSP trailing bit 1804.
そして、主ピクチャの各スライス(主スライス)に対応するNALユニット1801のNALヘッダ1802のnal_unit_type(NALユニットの種類を示す識別子)フィールドに、距離値符号化部25が行った符号化方式を示す識別子が入る。また、RBSP1803には、符号化されたデータである符号化データ#11と符号化データ#25とが入る。RBSPトレイリングビット1804は、RBSP1803の最後のビット位置を特定するための調整用ビットである。
An identifier indicating the encoding scheme performed by the distance value encoding unit 25 in the nal_unit_type (identifier indicating the type of NAL unit) field of the NAL header 1802 of the NAL unit 1801 corresponding to each slice (main slice) of the main picture. Enters. The RBSP 1803 contains encoded data # 11 and encoded data # 25, which are encoded data. The RBSP trailing bit 1804 is an adjustment bit for specifying the last bit position of the RBSP 1803.
また、参照フラグ#25Aおよびピクチャ情報#25Bは、PPS(Picture Parameter Set)と呼ばれる、ピクチャ全体の符号化モードを示すヘッダ情報を拡張し、ここに格納されて伝送される。
Also, the reference flag # 25A and the picture information # 25B extend header information called PPS (Picture Parameter Set) indicating the coding mode of the entire picture, and are stored and transmitted here.
なお、上記実施の形態では、動画像符号化装置1は、H.264/MPEG-4 AVC規格に規定されているAVC符号化を用いてテクスチャ画像#1を符号化するものとしたが、本発明はこれに限定されない。すなわち、動画像符号化装置1の画像符号化部11は、MPEG―2やMPEG-4他の他の符号化方式を用いてテクスチャ画像#1を符号化してもよい。
In the above embodiment, the moving picture encoding apparatus 1 is an H.264 standard. The texture image # 1 is encoded using AVC encoding defined in the H.264 / MPEG-4 AVC standard, but the present invention is not limited to this. That is, the image encoding unit 11 of the moving image encoding apparatus 1 may encode the texture image # 1 using another encoding method such as MPEG-2 or MPEG-4.
(動画像符号化装置の動作)
次に、動画像符号化装置1の動作について、図22を参照しながら以下に説明する。図22は、動画像符号化装置1の動作を示すフローチャートである。なお、ここで説明する動画像符号化装置1の動作とは、多数のフレームからなる動画像における先頭からtフレーム目のテクスチャ画像および距離画像を符号化する動作である。すなわち、動画像符号化装置1は、上記動画像全体を符号化するために、上記動画像のフレーム数に応じた回数だけ以下に説明する動作を繰り返すことになる。また、以下の動作の説明においては、特に明示していなければ、各データ#1~#28はtフレーム目のデータであると解釈するものとする。 (Operation of video encoding device)
Next, the operation of the movingpicture encoding apparatus 1 will be described below with reference to FIG. FIG. 22 is a flowchart showing the operation of the moving image encoding apparatus 1. Note that the operation of the moving image encoding apparatus 1 described here is an operation of encoding a texture image and a distance image of the t frame from the head in a moving image including a large number of frames. That is, the moving image encoding apparatus 1 repeats the operation described below as many times as the number of frames of the moving image in order to encode the entire moving image. In the following description of the operation, unless otherwise specified, each data # 1 to # 28 is interpreted as data of the t-th frame.
次に、動画像符号化装置1の動作について、図22を参照しながら以下に説明する。図22は、動画像符号化装置1の動作を示すフローチャートである。なお、ここで説明する動画像符号化装置1の動作とは、多数のフレームからなる動画像における先頭からtフレーム目のテクスチャ画像および距離画像を符号化する動作である。すなわち、動画像符号化装置1は、上記動画像全体を符号化するために、上記動画像のフレーム数に応じた回数だけ以下に説明する動作を繰り返すことになる。また、以下の動作の説明においては、特に明示していなければ、各データ#1~#28はtフレーム目のデータであると解釈するものとする。 (Operation of video encoding device)
Next, the operation of the moving
最初に、画像符号化部11および距離画像分割処理部22が、それぞれ、テクスチャ画像#1および距離画像#2を動画像符号化装置1の外部から受信する(S1)。上述したように、外部から受信されるテクスチャ画像#1および距離画像#2のペアは、例えば図3のテクスチャ画像と図6の距離画像とを対比するとわかるように、画像の内容に互いに相関がある。
First, the image encoding unit 11 and the distance image division processing unit 22 respectively receive the texture image # 1 and the distance image # 2 from the outside of the moving image encoding device 1 (S1). As described above, the texture image # 1 and the distance image # 2 received from the outside are correlated with each other in the content of the image, as can be seen, for example, by comparing the texture image of FIG. 3 and the distance image of FIG. is there.
次に、画像符号化部11は、H.264/MPEG-4 AVC規格に規定されているAVC符号化方式によりテクスチャ画像#1の符号化を行い、得られたテクスチャ画像の符号化データ#11をパッケージング部28と画像復号部12とに出力する(S2)。また、ステップS2において、画像符号化部11は、選択したピクチャの種類、および、選択したピクチャがBピクチャまたはPピクチャである場合は、その参照ピクチャを、距離値符号化部25へ出力する。
Next, the image encoding unit 11 The texture image # 1 is encoded by the AVC encoding method stipulated in the H.264 / MPEG-4 AVC standard, and the obtained texture image encoded data # 11 is transmitted to the packaging unit 28 and the image decoding unit 12. Output (S2). In step S <b> 2, the image encoding unit 11 outputs the reference picture to the distance value encoding unit 25 when the selected picture type and the selected picture are a B picture or a P picture.
そして、画像復号部12は、符号化データ#11からテクスチャ画像#1´を復号して画像分割処理部21に出力する(S3)。その後、画像分割処理部21は、入力されたテクスチャ画像#1´から、複数のセグメントを規定する(S4)。
Then, the image decoding unit 12 decodes the texture image # 1 ′ from the encoded data # 11 and outputs it to the image division processing unit 21 (S3). Thereafter, the image division processing unit 21 defines a plurality of segments from the input texture image # 1 ′ (S4).
次に、画像分割処理部21は、各セグメントの位置情報からなるセグメント情報#21を生成し、距離画像分割処理部22に出力する(S5)。セグメントの位置情報としては、例えば、そのセグメントの他のセグメントとの境界に位置する画素群の各座標値が挙げられる。すなわち、図3のテクスチャ画像から各セグメントを規定した場合、図5における閉領域の輪郭部分に位置する各座標の座標値がセグメントの位置情報となる。
Next, the image division processing unit 21 generates segment information # 21 including position information of each segment, and outputs it to the distance image division processing unit 22 (S5). As the position information of the segment, for example, each coordinate value of the pixel group located at the boundary with the other segment of the segment can be cited. That is, when each segment is defined from the texture image of FIG. 3, the coordinate value of each coordinate located in the contour portion of the closed region in FIG. 5 becomes the position information of the segment.
その後、距離画像分割処理部22は、入力された距離画像#2を複数のセグメントに分割する。そして、距離画像分割処理部22は、距離画像#2の各セグメントについて、該セグメントに含まれる各画素の距離値を距離値セットとして抽出する。さらに、距離画像分割処理部22は、セグメント情報#21に含まれる各セグメントの位置情報に、対応するセグメントから抽出した距離値セットを関連づける。そして、距離画像分割処理部22は、これにより得られたセグメント情報#22を、距離値修正部23に出力する(S6、画像分割ステップ)。
Thereafter, the distance image division processing unit 22 divides the input distance image # 2 into a plurality of segments. Then, the distance image division processing unit 22 extracts a distance value of each pixel included in the segment as a distance value set for each segment of the distance image # 2. Furthermore, the distance image division processing unit 22 associates the distance value set extracted from the corresponding segment with the position information of each segment included in the segment information # 21. Then, the distance image division processing unit 22 outputs the segment information # 22 obtained thereby to the distance value correction unit 23 (S6, image division step).
次に、距離値修正部23は、距離画像#2の各セグメントについて、セグメント情報#22に含まれる該セグメントの距離値セットから代表値#23aを算出する。そして、セグメント情報#22に含まれる距離値セットの各々を、対応するセグメントの代表値#23aに置き換え、セグメント情報#23として番号付与部24に出力する(S7、代表値決定ステップ)。
Next, the distance value correction unit 23 calculates a representative value # 23a from the distance value set of the segment included in the segment information # 22 for each segment of the distance image # 2. Then, each of the distance value sets included in the segment information # 22 is replaced with the representative value # 23a of the corresponding segment, and is output to the number assigning unit 24 as the segment information # 23 (S7, representative value determining step).
そして、番号付与部24は、セグメント情報#23に含まれている位置情報および代表値#23aの各組について、代表値#23aと位置情報に応じたセグメント番号#24とを関連づけ、M組の代表値#23aおよびセグメント番号#24を距離値符号化部25に出力する(S8)。
Then, the number assigning unit 24 associates the representative value # 23a with the segment number # 24 corresponding to the position information for each set of the position information and the representative value # 23a included in the segment information # 23, and sets M sets The representative value # 23a and the segment number # 24 are output to the distance value encoding unit 25 (S8).
その後、距離値符号化部25は、入力された代表値#23aおよびセグメント番号#24に符号化処理を施し、得られた符号化データ#25をパッケージング部28に出力する(S9、符号化方式選択ステップ、符号化ステップ)。
Thereafter, the distance value encoding unit 25 performs encoding processing on the input representative value # 23a and segment number # 24, and outputs the obtained encoded data # 25 to the packaging unit 28 (S9, encoding) Scheme selection step, encoding step).
そして、パッケージング部28は、ステップS2にて画像符号化部11が出力した符号化データ#11とステップS9にて距離値符号化部25が出力した符号化データ#25とを統合し、得られた符号化データ#28を、動画像復号装置2に出力する(S10)。
Then, the packaging unit 28 integrates the encoded data # 11 output from the image encoding unit 11 in step S2 and the encoded data # 25 output from the distance value encoding unit 25 in step S9. The encoded data # 28 is output to the video decoding device 2 (S10).
以上が、動画像符号化装置1の動作である。
The above is the operation of the video encoding device 1.
(動画像復号装置の構成)
次に、本発明の一実施の形態に係る動画像復号装置2について、図23および図24に基づいて以下に説明する。本実施の形態に係る動画像復号装置2は、上述した動画像符号化装置1より伝送された符号化データ#28からテクスチャ画像#1´および距離画像#2´を復号するものである。そして、復号したテクスチャ画像#1´および距離画像#2´をフレーム画像として動画像を構成する装置に出力する。 (Configuration of video decoding device)
Next, the movingpicture decoding apparatus 2 according to an embodiment of the present invention will be described below with reference to FIGS. The video decoding device 2 according to the present embodiment decodes the texture image # 1 ′ and the distance image # 2 ′ from the encoded data # 28 transmitted from the above-described video encoding device 1. Then, the decoded texture image # 1 ′ and distance image # 2 ′ are output as frame images to a device constituting the moving image.
次に、本発明の一実施の形態に係る動画像復号装置2について、図23および図24に基づいて以下に説明する。本実施の形態に係る動画像復号装置2は、上述した動画像符号化装置1より伝送された符号化データ#28からテクスチャ画像#1´および距離画像#2´を復号するものである。そして、復号したテクスチャ画像#1´および距離画像#2´をフレーム画像として動画像を構成する装置に出力する。 (Configuration of video decoding device)
Next, the moving
最初に本実施の形態に係る動画像復号装置2の構成について図23を参照しながら説明する。図23は、動画像復号装置2の要部構成を示すブロック図である。
First, the configuration of the video decoding device 2 according to the present embodiment will be described with reference to FIG. FIG. 23 is a block diagram illustrating a main configuration of the video decoding device 2.
図23に示すように、動画像復号装置2は、画像復号部12、画像分割処理部21´、アンパッケージング部(取得手段)31、距離値復号部(復号手段、静的コードブック作成手段)32、および距離値付与部(画像生成手段)33を含む構成である。
As shown in FIG. 23, the moving image decoding apparatus 2 includes an image decoding unit 12, an image division processing unit 21 ′, an unpackaging unit (acquisition unit) 31, a distance value decoding unit (decoding unit, static codebook generation unit). ) 32 and a distance value assigning unit (image generating means) 33.
アンパッケージング部31は、符号化データ#28から、テクスチャ画像#1の符号化データ#11と距離画像#2の符号化データ#25とを抽出する。そして、テクスチャ画像#1の符号化データ#11を画像復号部12に、距離画像#2の符号化データ#25を距離値復号部32に出力する。
The unpackaging unit 31 extracts the encoded data # 11 of the texture image # 1 and the encoded data # 25 of the distance image # 2 from the encoded data # 28. The encoded data # 11 of the texture image # 1 is output to the image decoding unit 12, and the encoded data # 25 of the distance image # 2 is output to the distance value decoding unit 32.
画像復号部12は、符号化データ#11からテクスチャ画像#1´を復号する。画像復号部12は、動画像符号化装置1が備える画像復号部12と同一である。すなわち、画像復号部12は、動画像符号化装置1から動画像復号装置2への符号化データ#28の伝送中に符号化データ#28中にノイズが混入しない限り、動画像符号化装置1の画像復号部12が復号したテクスチャ画像と同一内容のテクスチャ画像#1´を復号するようになっている。そして、復号したテクスチャ画像#1´を出力する。
The image decoding unit 12 decodes the texture image # 1 ′ from the encoded data # 11. The image decoding unit 12 is the same as the image decoding unit 12 included in the moving image encoding device 1. That is, the image decoding unit 12 is configured to transmit the encoded data # 28 from the moving image encoding apparatus 1 to the moving image decoding apparatus 2 as long as no noise is mixed in the encoded data # 28. The texture image # 1 ′ having the same content as the texture image decoded by the image decoding unit 12 is decoded. Then, the decoded texture image # 1 ′ is output.
また、画像復号部12は、復号したテクスチャ画像#1´のピクチャの種類、および参照ピクチャの情報を距離値復号部32へ出力する。
Also, the image decoding unit 12 outputs the decoded picture type of the texture image # 1 ′ and the reference picture information to the distance value decoding unit 32.
画像分割処理部21´は、動画像符号化装置1の画像分割処理部21と同じアルゴリズムにより、テクスチャ画像#1´の全体領域を複数のセグメント(領域)に分割する。そして、画像分割処理部21´は、各セグメントの位置情報からなるセグメント情報#21´を距離値付与部33に出力する。
The image division processing unit 21 ′ divides the entire area of the texture image # 1 ′ into a plurality of segments (areas) using the same algorithm as the image division processing unit 21 of the moving image encoding device 1. Then, the image division processing unit 21 ′ outputs segment information # 21 ′ including the position information of each segment to the distance value giving unit 33.
距離値復号部32は、符号化された距離画像の符号化データ#25、参照フラグ#25A、ピクチャ情報#25Bから代表値#23aおよびセグメント番号#24(復号データ)を復号する。これにより、動画像符号化装置1の距離値符号化部25で符号化された、図13の数列1301が復号される。
The distance value decoding unit 32 decodes the representative value # 23a and the segment number # 24 (decoded data) from the encoded distance image encoded data # 25, the reference flag # 25A, and the picture information # 25B. Thereby, the sequence 1301 of FIG. 13 encoded by the distance value encoding unit 25 of the moving image encoding apparatus 1 is decoded.
より詳細に説明すると、参照フラグ#25Aが「0」の場合は、適応的圧縮符号化がされた符号化データなので、適応的復号を行う。また、参照フラグ#25Aが「1」の場合は、静的圧縮符号化された符号化データなので、静的復号を行う。
More specifically, when the reference flag # 25A is “0”, it is encoded data that has been subjected to adaptive compression encoding, so adaptive decoding is performed. When the reference flag # 25A is “1”, it is encoded data that has been subjected to static compression encoding, and therefore static decoding is performed.
具体的には、最初のピクチャは、参照フラグは「0」なので、適応的圧縮符号化されたものである。よって、適応的復号を行い、数列1301を復号する。このとき、動画像符号化装置1の距離値符号化部25において適応的符号化を行ったのと同様に、距離値の各値の出現数を計数し、静的符号化テーブル1901を作成し、コードブック1902を、次の符号化データの処理まで保存する。なお、保存期間は、次の符号化データの処理までに限られるものではなく、コードブック1902が必要とされなくなるまでであってもよい。
Specifically, since the reference flag is “0”, the first picture has been subjected to adaptive compression coding. Therefore, adaptive decoding is performed, and the sequence 1301 is decoded. At this time, similarly to the case where the adaptive encoding is performed in the distance value encoding unit 25 of the moving image encoding apparatus 1, the number of occurrences of each value of the distance value is counted, and a static encoding table 1901 is created. The code book 1902 is stored until the next encoded data is processed. Note that the storage period is not limited to the processing of the next encoded data, and may be until the code book 1902 is not required.
その後、参照フラグ#25A「1」とともに入力された符号化データに対し、保存されている、参照対象となるコードブック1902を用いて、静的復号を行う。参照対象となるコードブック1902は、参照フラグ#25Aとともに入力されたピクチャ情報#25Bでしめされたピクチャ番号と対応するものを選択する。
Thereafter, static decoding is performed on the encoded data input together with the reference flag # 25A “1” by using the stored code book 1902 to be referred to. The code book 1902 to be referred to selects the one corresponding to the picture number indicated by the picture information # 25B input together with the reference flag # 25A.
これにより、静的復号のためのコードブック1902を動画像符号化装置1から動画像復号装置2へ伝送することなく、動画像復号装置2で作成することができる。よって、情報量を大幅に削減して符号化データ等を伝送することができる。
Thus, the moving picture decoding apparatus 2 can create the code book 1902 for static decoding without transmitting it from the moving picture encoding apparatus 1 to the moving picture decoding apparatus 2. Therefore, it is possible to transmit the encoded data or the like with a greatly reduced amount of information.
そして、図12のセグメントテーブル1201が復号される。このセグメントテーブル1201を距離値付与部33に出力する。
Then, the segment table 1201 in FIG. 12 is decoded. This segment table 1201 is output to the distance value assigning unit 33.
距離値付与部33は、入力された代表値#23aおよびセグメント番号#24に基づいて、各セグメントに含まれる画素に、当該セグメントの代表値である画素値(距離値)を当てはめて距離画像#2´を復元する。そして、復元した距離画像#2´を出力する。
Based on the input representative value # 23a and segment number # 24, the distance value assigning unit 33 applies a pixel value (distance value), which is a representative value of the segment, to the pixel included in each segment. Restore 2 '. Then, the restored distance image # 2 ′ is output.
以上より、テクスチャ画像#1および距離画像#2を復号することができる。
From the above, texture image # 1 and distance image # 2 can be decoded.
例えば、以下のような方法で、テクスチャ画像の画面をセグメント単位に分割すると、入力されたテクスチャ画像が1024×768ドットの画像である場合、数千個程度(例えば3000個~5000個)のセグメントに分割することができる。なお、AVC符号化方式では、ブロック(4×4=16画素)の総数は約49000個である。
For example, when the texture image screen is divided into segments by the following method, if the input texture image is an image of 1024 × 768 dots, about several thousand segments (for example, 3000 to 5000 segments) Can be divided into In the AVC encoding method, the total number of blocks (4 × 4 = 16 pixels) is about 49000.
具体的には、画像分割処理部21は、入力されたテクスチャ画像#1´から、各セグメントについて、該セグメントに含まれる画素群の画素値から算出される平均値と該セグメントに隣接するセグメントに含まれる画素群の画素値から算出される平均値との差が所定の閾値以下であるような複数のセグメントを規定する。
Specifically, the image division processing unit 21 calculates an average value calculated from the pixel values of the pixel group included in the segment and a segment adjacent to the segment from the input texture image # 1 ′. A plurality of segments whose difference from the average value calculated from the pixel values of the included pixel group is equal to or less than a predetermined threshold value are defined.
上記平均値の差が所定の閾値以上であるような複数のセグメントを規定する具体的なアルゴリズムについて図28および図29を参照しながら以下に説明する。
A specific algorithm for defining a plurality of segments in which the difference between the average values is equal to or greater than a predetermined threshold will be described below with reference to FIGS. 28 and 29.
図28は、上記アルゴリズムに基づいて動画像符号化装置1が複数のセグメントを規定する動作を示すフローチャート図である。また、図29は、図28のフローチャートにおけるセグメント結合処理のサブルーチンを示すフローチャート図である。
FIG. 28 is a flowchart showing an operation in which the video encoding device 1 defines a plurality of segments based on the above algorithm. FIG. 29 is a flowchart showing a subroutine of segment combination processing in the flowchart of FIG.
画像分割処理部21は、平滑化処理が施されたテクスチャ画像に対し、図28中の初期化ステップで、テクスチャ画像中に含まれる全ての画素の各々について、独立した1つのセグメント(暫定セグメント)を規定し、各暫定セグメントにおける全画素値の平均値(平均色)として、対応する画素の画素値そのものを設定する(S41)。
The image division processing unit 21 performs one independent segment (provisional segment) for each of all the pixels included in the texture image in the initialization step in FIG. 28 with respect to the texture image subjected to the smoothing process. And the pixel value itself of the corresponding pixel is set as the average value (average color) of all the pixel values in each provisional segment (S41).
次に、セグメント結合処理ステップ(S42)に進み、色が似ている暫定セグメント同士を結合させる。このセグメント結合処理について以下に図29を参照しながら詳細に説明するが、この結合処理を、結合が行われなくなるまで繰り返し続ける。
Next, the process proceeds to the segment combination processing step (S42), and the provisional segments having similar colors are combined. This segment combining process will be described in detail below with reference to FIG. 29, and this combining process is repeated until the combination is not performed.
画像分割処理部21は、全ての暫定セグメントについて、以下の処理(S51~S55)を行う。
The image division processing unit 21 performs the following processing (S51 to S55) for all provisional segments.
まず、画像分割処理部21は、注目する暫定セグメントの高さと幅とが、いずれも閾値以下であるかどうかを判定する(S51)。もしいずれも閾値以下であると判定された場合(S51でYES)、ステップS52の処理に進む。一方、いずれかが閾値より大きいと判定された場合(S51でNO)、次に注目すべき暫定セグメントについてステップS51の処理を行う。なお、次に注目すべき暫定セグメントは、例えば、ラスタースキャン順で注目している暫定セグメントの次に位置する暫定セグメントにすればよい。
First, the image division processing unit 21 determines whether or not the height and width of the temporary segment of interest are both equal to or less than a threshold value (S51). If it is determined that both are equal to or lower than the threshold (YES in S51), the process proceeds to step S52. On the other hand, when it is determined that any one is larger than the threshold value (NO in S51), the process of step S51 is performed for the temporary segment to be focused next. The temporary segment that should be noted next may be, for example, the temporary segment that is positioned next to the temporary segment that is focused in the raster scan order.
画像分割処理部21は、注目している暫定セグメントに隣接する暫定セグメントのうち、注目している暫定セグメントにおける平均色と最も近い平均色の暫定セグメントを選択する(S52)。色の近さを判断する指標としては、例えば、画素値のRGBの3つの値を3次元ベクトルと見做したときの、ベクトル同士のユークリッド距離を用いることができる。各セグメントの画素値としては、各セグメントに含まれる全画素値の平均値を用いる。
The image division processing unit 21 selects a temporary segment having an average color closest to the average color of the temporary segment of interest among the temporary segments adjacent to the temporary segment of interest (S52). As an index for judging the closeness of colors, for example, the Euclidean distance between vectors when the three RGB values of pixel values are regarded as a three-dimensional vector can be used. As a pixel value of each segment, an average value of all pixel values included in each segment is used.
ステップS52の処理の後、画像分割処理部21は、注目している暫定セグメントと、最も色が近いと判断された暫定セグメントとの近さが、ある閾値以下であるか否かを判定する(S53)。閾値より大きいと判定された場合(S53でNO)、次に注目すべき暫定セグメントについてステップS51の処理を行う。一方、閾値以下であると判定された場合(S53でNO)、ステップS54の処理に進む。
After the process of step S52, the image division processing unit 21 determines whether or not the proximity of the temporary segment of interest and the temporary segment that is determined to have the closest color is equal to or less than a certain threshold ( S53). If it is determined that the value is larger than the threshold value (NO in S53), the process of step S51 is performed for the temporary segment that should be noted next. On the other hand, if it is determined that the value is equal to or less than the threshold (NO in S53), the process proceeds to step S54.
ステップS53の処理の後、画像分割処理部21は、2つの暫定セグメント(注目している暫定セグメントと最も色が近いと判断された暫定セグメント)を結合することにより、1つの暫定セグメントに変換する(S54)。このステップS54の処理のより暫定セグメントの数が1減ることになる。
After the process of step S53, the image division processing unit 21 converts two provisional segments (provisional segments determined to be closest in color to the provisional segment of interest) into one provisional segment. (S54). The number of provisional segments is reduced by 1 by the process of step S54.
ステップS54の処理の後、変換後の対象セグメントに含まれる全画素の画素値の平均値を計算する(S55)。まだステップS51~S55までの処理を行っていないセグメントがある場合には、次に注目すべき暫定セグメントについてステップS51の処理を行う。
After the process of step S54, the average value of the pixel values of all the pixels included in the converted target segment is calculated (S55). If there is a segment that has not yet been subjected to the processing of steps S51 to S55, the processing of step S51 is performed for the temporary segment to be noticed next.
ステップS51~S55の処理を全暫定セグメントについて完了した後、ステップS43の処理に進む。
After completing the processes of steps S51 to S55 for all the provisional segments, the process proceeds to the process of step S43.
画像分割処理部21は、ステップS42の処理を行う前の暫定セグメントの数とステップS42の処理を行った後の暫定セグメントの数とを比較する(S43)。
The image division processing unit 21 compares the number of provisional segments before the process of step S42 with the number of provisional segments after the process of step S42 (S43).
暫定セグメントの数が減少した場合(S43でYES)には、ステップS42の処理に戻る。一方、暫定セグメントの数が変わらない場合(S43でNO)、画像分割処理部21は、現状の各暫定セグメントを1つのセグメントとして規定する。
If the number of provisional segments has decreased (YES in S43), the process returns to step S42. On the other hand, when the number of temporary segments does not change (NO in S43), the image division processing unit 21 defines each current temporary segment as one segment.
以上のようなアルゴリズムによって、上述したように、入力されたテクスチャ画像が1024×768ドットの画像である場合、数千個程度(例えば3000個~5000個)のセグメントに分割することができる。
By the above algorithm, as described above, when the input texture image is an image of 1024 × 768 dots, it can be divided into about several thousand (for example, 3000 to 5000) segments.
なお、上述したように、セグメントは、距離画像を分割するために用いられる。したがって、セグメントのサイズが大きくなり過ぎると、1つのセグメントの中にさまざまな距離値が含まれてしまい、代表値との誤差が大きい画素が生じて、距離画像の符号化精度が低下する。したがって、本発明ではステップS51の処理は必須ではないがステップS51のようにセグメントの大きさを制限することにより、セグメントのサイズが大きくなり過ぎることを防ぐことが望ましい。
As described above, the segment is used to divide the distance image. Therefore, if the size of the segment becomes too large, various distance values are included in one segment, resulting in a pixel having a large error from the representative value, and the encoding accuracy of the distance image is lowered. Therefore, in the present invention, the process of step S51 is not essential, but it is desirable to prevent the segment size from becoming too large by limiting the segment size as in step S51.
このように、上述したアルゴリズムでは、数千個程度(例えば3000個~5000個)のセグメントに分割するのに対し、AVC符号化方式では、ブロック(4×4=16画素)の総数は約49000個となる。そして、このブロックごとに、直交変換を行い、その係数を量子化して伝送する。
As described above, in the above-described algorithm, the segment is divided into about several thousand segments (for example, 3000 to 5000 segments), whereas in the AVC encoding method, the total number of blocks (4 × 4 = 16 pixels) is about 49000. It becomes a piece. Then, orthogonal transform is performed for each block, and the coefficients are quantized and transmitted.
よって、本実施の形態では、直交変換の処理単位数よりも大幅に少ないセグメント数とすることができる。また、各セグメント内の距離値は一定であるため、直交変換をする必要がなく、8ビットの情報で距離値を伝送することができる。さらに、本実施の形態では、適応的圧縮符号化方式を行うこと、およびコードブックを再利用することにより、より圧縮効率の向上を図ることができる。したがって、本実施の形態では、テクスチャ映像(画像)と距離映像(画像)とをそれぞれAVC符号化方式で符号化することに比べ、圧縮効率を大幅に向上させることができる。
Therefore, in this embodiment, the number of segments can be made significantly smaller than the number of processing units for orthogonal transformation. Further, since the distance value in each segment is constant, it is not necessary to perform orthogonal transform, and the distance value can be transmitted with 8-bit information. Furthermore, in this embodiment, it is possible to further improve the compression efficiency by performing an adaptive compression encoding method and reusing a code book. Therefore, in this embodiment, the compression efficiency can be greatly improved as compared with the case where the texture video (image) and the distance video (image) are each encoded by the AVC encoding method.
(動画像復号装置の動作)
次に、動画像復号装置2の動作について、図24を参照しながら以下に説明する。図24は、動画像復号装置2の動作を示すフローチャートである。ここで説明する動画像復号装置2の動作とは、多数のフレームからなる3次元動画像における先頭からtフレーム目のテクスチャ画像および距離画像を復号する動作である。すなわち、動画像復号装置2は、上記動画像全体を復号するために、上記動画像のフレーム数に応じた回数だけ以下に説明する動作を繰り返すことになる。また、以下の説明においては、特に断わらない限り、各データ#1~#28はtフレーム目のデータであると解釈するものとする。 (Operation of video decoding device)
Next, the operation of thevideo decoding device 2 will be described below with reference to FIG. FIG. 24 is a flowchart showing the operation of the video decoding device 2. The operation of the moving image decoding apparatus 2 described here is an operation of decoding a texture image and a distance image of the t-th frame from the top in a three-dimensional moving image including a large number of frames. That is, the moving image decoding apparatus 2 repeats the operation described below as many times as the number of frames of the moving image in order to decode the entire moving image. In the following description, unless otherwise specified, each data # 1 to # 28 is interpreted as data at the t-th frame.
次に、動画像復号装置2の動作について、図24を参照しながら以下に説明する。図24は、動画像復号装置2の動作を示すフローチャートである。ここで説明する動画像復号装置2の動作とは、多数のフレームからなる3次元動画像における先頭からtフレーム目のテクスチャ画像および距離画像を復号する動作である。すなわち、動画像復号装置2は、上記動画像全体を復号するために、上記動画像のフレーム数に応じた回数だけ以下に説明する動作を繰り返すことになる。また、以下の説明においては、特に断わらない限り、各データ#1~#28はtフレーム目のデータであると解釈するものとする。 (Operation of video decoding device)
Next, the operation of the
最初に、アンパッケージング部31は、動画像符号化装置1より受信した符号化データ#28から、テクスチャ画像の符号化データ#11および距離画像の符号化データ#25、参照フラグ#25A、ピクチャ情報#25Bを抽出する。そして、アンパッケージング部31は、符号化データ#11を画像復号部12に出力し、符号化データ#25、参照フラグ#25A、ピクチャ情報#25Bを距離値復号部32に出力する(S21、取得ステップ)。
First, the unpackaging unit 31 starts from the encoded data # 28 received from the moving image encoding apparatus 1 and encodes the texture image encoded data # 11, the distance image encoded data # 25, the reference flag # 25A, and the picture. Information # 25B is extracted. Then, the unpackaging unit 31 outputs the encoded data # 11 to the image decoding unit 12, and outputs the encoded data # 25, the reference flag # 25A, and the picture information # 25B to the distance value decoding unit 32 (S21, Acquisition step).
画像復号部12は、入力された符号化データ#11からテクスチャ画像#1´を復号し、画像分割処理部21´と動画像復号装置2の外部の立体映像表示装置(図示せず)とに出力する(S22)。また、画像復号部12は、選択されたピクチャの種類および参照ピクチャを示すピクチャ情報#11A報を距離値復号部32へ出力する。
The image decoding unit 12 decodes the texture image # 1 ′ from the input encoded data # 11, and sends it to the image division processing unit 21 ′ and a stereoscopic video display device (not shown) outside the moving image decoding device 2. Output (S22). Further, the image decoding unit 12 outputs the picture information # 11A report indicating the type of the selected picture and the reference picture to the distance value decoding unit 32.
次に、画像分割処理部21´は動画像符号化装置1の画像分割処理部21と同じアルゴリズムで複数のセグメントを規定する。
Next, the image division processing unit 21 ′ defines a plurality of segments using the same algorithm as the image division processing unit 21 of the moving image encoding device 1.
そして、画像分割処理部21´は、各セグメントについて、テクスチャ画像#1´中のラスタースキャン順で、それぞれのセグメントに含まれる各画素の画素値を代表値に置き換えることにより、セグメント識別用画像#21´を生成する。画像分割処理部21´は、セグメント識別用画像#21´を距離値付与部33に出力する(S23)。
Then, for each segment, the image division processing unit 21 ′ replaces the pixel value of each pixel included in each segment with a representative value in the raster scan order in the texture image # 1 ′, so that the segment identification image # 21 ′ is generated. The image division processing unit 21 ′ outputs the segment identification image # 21 ′ to the distance value providing unit 33 (S23).
一方、距離値復号部32は、符号化された距離画像の符号化データ#25、参照フラグ#25A、ピクチャ情報#25Bから、上述した2値列1701を復号する。さらに、距離値復号部32は、2値列1701から、セグメント番号と代表値#23aとを復号する。そして、距離値復号部32は、得られた代表値#23aおよびセグメント番号#24を距離値付与部33に出力する(S24、復号ステップ)。
On the other hand, the distance value decoding unit 32 decodes the binary string 1701 described above from the encoded data # 25 of the distance image, the reference flag # 25A, and the picture information # 25B. Further, the distance value decoding unit 32 decodes the segment number and the representative value # 23a from the binary string 1701. Then, the distance value decoding unit 32 outputs the obtained representative value # 23a and segment number # 24 to the distance value giving unit 33 (S24, decoding step).
距離値付与部33は、入力された代表値#23aおよびセグメント番号#24に基づいて、セグメント識別用画像#21中の全画素の画素値を、当該セグメントに含まれる代表値#23aに変換することにより、距離画像#2´を復号する。そして、距離値付与部33は、距離画像#2´を上述した立体映像表示装置に出力する(S25、画像生成ステップ)。
The distance value assigning unit 33 converts the pixel values of all the pixels in the segment identification image # 21 into the representative value # 23a included in the segment based on the input representative value # 23a and the segment number # 24. Thus, the distance image # 2 ′ is decoded. Then, the distance value assigning unit 33 outputs the distance image # 2 ′ to the above-described stereoscopic video display device (S25, image generation step).
以上、動画像復号装置2の動作について説明したが、ステップS25にて距離値付与部33が復号する距離画像#2´は、一般的に、動画像符号化装置1に入力される距離画像#2に近似する距離画像になる。
The operation of the video decoding device 2 has been described above. The distance image # 2 ′ decoded by the distance value assigning unit 33 in step S25 is generally the distance image # input to the video encoding device 1. The distance image approximates to 2.
これは、前述したように、テクスチャ画像#1と距離画像#2との相関から、「各セグメントが類似する色の画素群で構成されるような複数のセグメントにテクスチャ画像#1´を分割すると、距離画像#2中の単一のセグメントに含まれる全部または略全ての画素が同一の距離値を持つ傾向がある」と言えるからである。すなわち、距離画像#2´は、距離画像#2中のセグメントに含まれる極一部の距離値を該セグメントにおける代表値に変更することにより得られる画像と同一であるので、距離画像#2´と距離画像#2とは近似すると言える。
As described above, this is because, from the correlation between the texture image # 1 and the distance image # 2, “when the texture image # 1 ′ is divided into a plurality of segments each composed of a group of pixels having similar colors. This is because it can be said that all or almost all pixels included in a single segment in the distance image # 2 have the same distance value. That is, the distance image # 2 ′ is the same as the image obtained by changing the distance value of a very small part included in the segment in the distance image # 2 to the representative value in the segment. It can be said that the distance image # 2 is approximate.
また、上述した動画像符号化装置1と動画像復号装置2とを含む動画像伝送システムも、上述した効果を奏する。
Also, the moving image transmission system including the moving image encoding device 1 and the moving image decoding device 2 described above also exhibits the above-described effects.
〔実施の形態2〕
本発明の他の実施の形態について図25から図27に基づいて説明すれば、以下のとおりである。なお、説明の便宜上、上記の実施の形態1において示した部材と同一の機能を有する部材には、同一の符号を付し、その説明を省略する。 [Embodiment 2]
The following will describe another embodiment of the present invention with reference to FIGS. For convenience of explanation, members having the same functions as those shown in the first embodiment are given the same reference numerals, and explanation thereof is omitted.
本発明の他の実施の形態について図25から図27に基づいて説明すれば、以下のとおりである。なお、説明の便宜上、上記の実施の形態1において示した部材と同一の機能を有する部材には、同一の符号を付し、その説明を省略する。 [Embodiment 2]
The following will describe another embodiment of the present invention with reference to FIGS. For convenience of explanation, members having the same functions as those shown in the first embodiment are given the same reference numerals, and explanation thereof is omitted.
本実施の形態において、上記実施の形態1と異なるのは、テクスチャ画像と、テクスチャ画像に対応する距離画像とが、複数視点分ある点である。すなわち、本実施の形態に係る動画像符号化装置1Aは、上記実施の形態1の動画像符号化装置1と同様の符号化方式を用いてテクスチャ画像および距離画像の符号化処理を行うが、1フレームあたりテクスチャ画像および距離画像を複数組符号化する点において動画像符号化装置1と異なっている。
This embodiment is different from the first embodiment in that there are a plurality of viewpoints of texture images and distance images corresponding to the texture images. That is, the moving image encoding device 1A according to the present embodiment performs the texture image and distance image encoding processing using the same encoding method as the moving image encoding device 1 of the first embodiment. This is different from the moving image encoding apparatus 1 in that a plurality of sets of texture images and distance images are encoded per frame.
ここで、複数組のテクスチャ画像および距離画像は、被写体を取り囲むように複数箇所に設置されたカメラおよび測距装置によって同時に取り込まれた被写体の画像である。すなわち、複数組のテクスチャ画像および距離画像は、自由視点画像を生成するための画像である。また、各組のテクスチャ画像および距離画像には、当該組のテクスチャ画像および距離画像の実データとともに、カメラの位置、方向、および焦点距離などのカメラパラメータがメタデータとして含まれている。
Here, the plurality of sets of texture images and distance images are images of subjects simultaneously captured by cameras and ranging devices installed at a plurality of locations so as to surround the subject. That is, the plurality of sets of texture images and distance images are images for generating a free viewpoint image. Each set of texture images and distance images includes camera parameters such as camera position, direction, and focal length as metadata, along with actual data of the texture images and distance images of the set.
(動画像符号化装置の構成)
まず、動画像符号化装置1Aの構成について図25を用いて説明する。図25は、本実施の形態の動画像符号化装置1Aの要部構成を示すブロック図である。 (Configuration of video encoding device)
First, the configuration of the movingpicture encoding apparatus 1A will be described with reference to FIG. FIG. 25 is a block diagram showing a main configuration of the moving picture encoding apparatus 1A according to the present embodiment.
まず、動画像符号化装置1Aの構成について図25を用いて説明する。図25は、本実施の形態の動画像符号化装置1Aの要部構成を示すブロック図である。 (Configuration of video encoding device)
First, the configuration of the moving
図25に示すように、動画像符号化装置1Aは、画像符号化部(MVC符号化手段)11A、画像復号部(MVC復号手段)12A、距離画像符号化部20A、およびパッケージング部28´を備えている。また、距離画像符号化部20Aは、画像分割処理部21、距離画像分割処理部22、距離値修正部23、番号付与部24、および距離値符号化部(適応的符号化手段、出力手段)25Aを備えている。
As shown in FIG. 25, the moving image encoding apparatus 1A includes an image encoding unit (MVC encoding unit) 11A, an image decoding unit (MVC decoding unit) 12A, a distance image encoding unit 20A, and a packaging unit 28 ′. It has. The distance image encoding unit 20A includes an image division processing unit 21, a distance image division processing unit 22, a distance value correction unit 23, a number assigning unit 24, and a distance value encoding unit (adaptive encoding unit and output unit). 25A.
画像符号化部11Aは、上述した画像符号化部11と同様の符号化を行うものであるが、複数視点の画像を圧縮符号化する点が異なる。具体的には、画像符号化部11Aは、MVC(Multiview Video Coding)を用いて符号化する。上記実施の形態1に用いたAVCは1つの視点からの映像(画像)を圧縮符号化するための規格であるのに対し、MVCは多視点映像(画像)を圧縮符号化するための規格である。よって、画像符号化部11Aから出力される符号化データ#11は、MVC符号化データとなる。
The image encoding unit 11A performs the same encoding as the image encoding unit 11 described above, but differs in that it compresses and encodes images from a plurality of viewpoints. Specifically, the image encoding unit 11A performs encoding using MVC (Multiview Video Coding). AVC used in the first embodiment is a standard for compressing and encoding video (image) from one viewpoint, whereas MVC is a standard for compressing and encoding multi-view video (image). is there. Therefore, the encoded data # 11 output from the image encoding unit 11A is MVC encoded data.
MVC符号化は、視点間の冗長性排除のために、視点間においても、上記実施の形態1で説明した予測を行う。具体的に図26を用いて説明する。図26は、MVC符号化を説明するための図である。
MVC coding performs the prediction described in the first embodiment even between viewpoints in order to eliminate redundancy between viewpoints. This will be specifically described with reference to FIG. FIG. 26 is a diagram for explaining MVC encoding.
図26に示すように、符号化対象画像2301に対し、時間方向と視点方向(空間方向)からブロック単位で画像を予測する。ここでは、時間方向の画像として画像2303、画像2305が参照可能であることを示し、視点方向の画像として画像2302、画像2304が参照可能であることを示している。
As shown in FIG. 26, with respect to the encoding target image 2301, an image is predicted in block units from the time direction and the viewpoint direction (space direction). Here, the images 2303 and 2305 can be referred to as images in the time direction, and the images 2302 and 2304 can be referred to as images in the viewpoint direction.
時間経過とともに被写体の画面内の位置が変化することと、視点によって画面内における被写体の位置が変化することとは等価であるため、視点間においても、時間方向における画像予測と同様の予測方法が適用できる。
Since changing the position of the subject in the screen over time is equivalent to changing the position of the subject in the screen depending on the viewpoint, there is a prediction method similar to image prediction in the time direction between viewpoints. Applicable.
したがって、時間方向の画像間の冗長性と、空間方向の画像間の冗長性を排除するために、同じ手法を利用することができる。
Therefore, the same method can be used to eliminate the redundancy between images in the time direction and the redundancy between images in the spatial direction.
ここで、上述したような時間方向および空間方向の予測を行うと、時間方向と同様に、空間方向でも参照先の画像が発生する。空間方向における参照画像についても最大2枚まで参照することができる。
Here, when the prediction in the temporal direction and the spatial direction as described above is performed, a reference destination image is generated in the spatial direction as in the temporal direction. Up to two reference images in the spatial direction can be referred to.
よって、上述したように、画像符号化部11Aからは、上記実施の形態1と同様に、ピクチャの種類と、参照先のピクチャ番号と、参照先の視点番号の情報が距離値符号化部25Aに出力される。
Therefore, as described above, from the image encoding unit 11A, as in the first embodiment, information on the picture type, the reference destination picture number, and the reference destination viewpoint number is stored in the distance value encoding unit 25A. Is output.
画像復号部12Aは、画像復号部12と同様に、画像符号化部11Aから取得した、テクスチャ画像#1の符号化データ#11からテクスチャ画像#1´を復号する。そして、テクスチャ画像#1´を画像分割処理部21へ出力する。
Similar to the image decoding unit 12, the image decoding unit 12A decodes the texture image # 1 ′ from the encoded data # 11 of the texture image # 1 obtained from the image encoding unit 11A. Then, the texture image # 1 ′ is output to the image division processing unit 21.
距離値符号化部25Aは、距離値符号化部25と同様に、セグメント番号#24と代表値#23aとが関連付けられたデータに圧縮符号化処理を施し、得られた符号化データ#25をパッケージング部28´に出力する。
Similar to the distance value encoding unit 25, the distance value encoding unit 25A performs compression encoding processing on the data in which the segment number # 24 and the representative value # 23a are associated, and obtains the obtained encoded data # 25. Output to the packaging unit 28 '.
パッケージング部28´は、テクスチャ画像#1-1~#1-Nのそれぞれの符号化データ#11(-1~-N)と、対応する距離画像の距離値の符号化データ#25(-1~-N)と、参照フラグ#25A(-1~-N)と、ピクチャ情報#25B(-1~-N)とを統合し、符号化データ#28´を生成する。そして、パッケージング部28´は、生成した符号化データ#28´を動画像復号装置2Aに伝送する。
The packaging unit 28 'encodes the encoded data # 11 (-1 to -N) of the texture images # 1-1 to # 1-N and the encoded data # 25 (- 1 to -N), reference flag # 25A (-1 to -N), and picture information # 25B (-1 to -N) are integrated to generate encoded data # 28 '. Then, the packaging unit 28 ′ transmits the generated encoded data # 28 ′ to the video decoding device 2A.
(動画像復号装置の構成)
次に、本実施の形態の動画像復号装置2Aの構成について図27を用いて説明する。図27は、動画像復号装置2Aの要部構成を示すブロック図である。 (Configuration of video decoding device)
Next, the configuration of the movingpicture decoding apparatus 2A according to the present embodiment will be described with reference to FIG. FIG. 27 is a block diagram showing a main configuration of the moving picture decoding apparatus 2A.
次に、本実施の形態の動画像復号装置2Aの構成について図27を用いて説明する。図27は、動画像復号装置2Aの要部構成を示すブロック図である。 (Configuration of video decoding device)
Next, the configuration of the moving
図27に示すように、動画像復号装置2Aは、画像復号部12A、画像分割処理部21´、アンパッケージング部31´、距離値復号部32A、および距離値付与部33を含む構成である。
As shown in FIG. 27, the moving image decoding apparatus 2A includes an image decoding unit 12A, an image division processing unit 21 ′, an unpackaging unit 31 ′, a distance value decoding unit 32A, and a distance value giving unit 33. .
アンパッケージング部31´は、符号化データ28´を受信すると、符号化データ#11(-1~-N)と、符号化データ#25(-1~-N)と、参照フラグ#25A(-1~-N)と、ピクチャ情報#25B(-1~-N)とを抽出し、符号化データ#11は画像復号部12へ、符号化データ#25は距離値復号部32へ出力するものである。
Upon receiving the encoded data 28 ′, the unpackaging unit 31 ′ receives the encoded data # 11 (−1 to −N), the encoded data # 25 (−1 to −N), and the reference flag # 25A ( -1 to -N) and picture information # 25B (-1 to -N) are extracted, and the encoded data # 11 is output to the image decoding unit 12, and the encoded data # 25 is output to the distance value decoding unit 32. Is.
その他の構成は、テクスチャ画像および距離画像が複数視点分ある点を除いて、動画像復号装置2と同様である。そして、動画像復号装置2Aにおいても、図24に示したアルゴリズムに則って、参照画像を決定し、コードブックを再利用して距離値を復号して、距離画像を復元する。
Other configurations are the same as those of the moving image decoding apparatus 2 except that there are a plurality of viewpoints of texture images and distance images. Also in the moving image decoding apparatus 2A, the reference image is determined according to the algorithm shown in FIG. 24, the distance value is decoded by reusing the code book, and the distance image is restored.
本発明は上述した各実施の形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。
The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope of the claims, and the embodiments can be obtained by appropriately combining technical means disclosed in different embodiments. The form is also included in the technical scope of the present invention.
(応用例)
上述した動画像復号装置1および動画像符号化装置2は、動画像の送信、受信、記録、再生を行う各種装置に搭載して利用することができる。 (Application examples)
The movingpicture decoding apparatus 1 and the moving picture encoding apparatus 2 described above can be used by being mounted on various apparatuses that perform moving picture transmission, reception, recording, and reproduction.
上述した動画像復号装置1および動画像符号化装置2は、動画像の送信、受信、記録、再生を行う各種装置に搭載して利用することができる。 (Application examples)
The moving
まず、上述した動画像復号装置1および動画像符号化装置2を、動画像の送信及び受信に利用できることを、図30を参照して説明する。
First, it will be described with reference to FIG. 30 that the moving picture decoding apparatus 1 and the moving picture encoding apparatus 2 described above can be used for transmission and reception of moving pictures.
図30(a)は、動画像符号化装置2を搭載した送信装置Aの構成を示したブロック図である。図30(a)に示すように、送信装置Aは、動画像を符号化することによって符号化データを得る符号化部A1と、符号化部A1が得た符号化データで搬送波を変調することによって変調信号を得る変調部A2と、変調部A2が得た変調信号を送信する送信部A3と、を備えている。上述した動画像符号化装置2は、この符号化部A1として利用される。
FIG. 30 (a) is a block diagram showing a configuration of a transmission apparatus A in which the moving picture encoding apparatus 2 is mounted. As shown in FIG. 30 (a), the transmitting apparatus A encodes a moving image, obtains encoded data, and modulates a carrier wave with the encoded data obtained by the encoding unit A1. A modulation unit A2 that obtains a modulation signal by the transmission unit A2 and a transmission unit A3 that transmits the modulation signal obtained by the modulation unit A2. The moving image encoding device 2 described above is used as the encoding unit A1.
送信装置Aは、符号化部A1に入力する動画像の供給源として、動画像を撮像するカメラA4、動画像を記録した記録媒体A5、及び、動画像を外部から入力するための入力端子A6を更に備えていてもよい。図30(a)においては、これら全てを送信装置Aが備えた構成を例示しているが、一部を省略しても構わない。
The transmission apparatus A has a camera A4 that captures a moving image, a recording medium A5 that records the moving image, and an input terminal A6 for inputting the moving image from the outside as a supply source of the moving image input to the encoding unit A1. May be further provided. FIG. 30A illustrates a configuration in which the transmission apparatus A includes all of these, but some of them may be omitted.
なお、記録媒体A5は、符号化されていない動画像を記録したものであってもよいし、伝送用の符号化方式とは異なる記録用の符号化方式で符号化された動画像を記録したものであってもよい。後者の場合、記録媒体A5と符号化部A1との間に、記録媒体A5から読み出した符号化データを記録用の符号化方式に従って復号する復号部(不図示)を介在させるとよい。
The recording medium A5 may be a recording of a non-encoded moving image, or a recording of a moving image encoded by a recording encoding scheme different from the transmission encoding scheme. It may be a thing. In the latter case, a decoding unit (not shown) for decoding the encoded data read from the recording medium A5 according to the recording encoding method may be interposed between the recording medium A5 and the encoding unit A1.
図30(b)は、動画像復号装置1を搭載した受信装置Bの構成を示したブロック図である。図30(b)に示すように、受信装置Bは、変調信号を受信する受信部B1と、受信部B1が受信した変調信号を復調することによって符号化データを得る復調部B2と、復調部B2が得た符号化データを復号することによって動画像を得る復号部B3と、を備えている。上述した動画像復号装置1は、この復号部B3として利用される。
FIG. 30B is a block diagram illustrating a configuration of the receiving device B on which the moving image decoding device 1 is mounted. As illustrated in FIG. 30B, the receiving device B includes a receiving unit B1 that receives a modulated signal, a demodulating unit B2 that obtains encoded data by demodulating the modulated signal received by the receiving unit B1, and a demodulating unit. A decoding unit B3 that obtains a moving image by decoding the encoded data obtained by B2. The moving picture decoding apparatus 1 described above is used as the decoding unit B3.
受信装置Bは、復号部B3が出力する動画像の供給先として、動画像を表示するディスプレイB4、動画像を記録するための記録媒体B5、及び、動画像を外部に出力するための出力端子B6を更に備えていてもよい。図30(b)においては、これら全てを受信装置Bが備えた構成を例示しているが、一部を省略しても構わない。
The receiving apparatus B has a display B4 for displaying a moving image, a recording medium B5 for recording the moving image, and an output terminal for outputting the moving image as a supply destination of the moving image output from the decoding unit B3. B6 may be further provided. FIG. 30B illustrates a configuration in which the receiving apparatus B includes all of these, but a part of the configuration may be omitted.
なお、記録媒体B5は、符号化されていない動画像を記録するためのものであってもよいし、伝送用の符号化方式とは異なる記録用の符号化方式で符号化されたものであってもよい。後者の場合、復号部B3と記録媒体B5との間に、復号部B3から取得した動画像を記録用の符号化方式に従って符号化する符号化部(不図示)を介在させるとよい。
Note that the recording medium B5 may be for recording an unencoded moving image, or is encoded by a recording encoding method different from the transmission encoding method. May be. In the latter case, an encoding unit (not shown) that encodes the moving image acquired from the decoding unit B3 in accordance with the recording encoding method may be interposed between the decoding unit B3 and the recording medium B5.
なお、変調信号を伝送する伝送媒体は、無線であってもよいし、有線であってもよい。また、変調信号を伝送する伝送態様は、放送(ここでは、送信先が予め特定されていない送信態様を指す)であってもよいし、通信(ここでは、送信先が予め特定されている送信態様を指す)であってもよい。すなわち、変調信号の伝送は、無線放送、有線放送、無線通信、及び有線通信の何れによって実現してもよい。
Note that the transmission medium for transmitting the modulation signal may be wireless or wired. Further, the transmission mode for transmitting the modulated signal may be broadcasting (here, a transmission mode in which the transmission destination is not specified in advance) or communication (here, transmission in which the transmission destination is specified in advance). Refers to the embodiment). That is, the transmission of the modulation signal may be realized by any of wireless broadcasting, wired broadcasting, wireless communication, and wired communication.
例えば、地上デジタル放送の放送局(放送設備など)/受信局(テレビジョン受像機など)は、変調信号を無線放送で送受信する送信装置A/受信装置Bの一例である。また、ケーブルテレビ放送の放送局(放送設備など)/受信局(テレビジョン受像機など)は、変調信号を有線放送で送受信する送信装置A/受信装置Bの一例である。
For example, a terrestrial digital broadcast broadcasting station (such as broadcasting equipment) / receiving station (such as a television receiver) is an example of a transmitting device A / receiving device B that transmits and receives a modulated signal by wireless broadcasting. A broadcasting station (such as broadcasting equipment) / receiving station (such as a television receiver) for cable television broadcasting is an example of a transmitting device A / receiving device B that transmits and receives a modulated signal by cable broadcasting.
また、インターネットを用いたVOD(Video On Demand)サービスや動画共有サービスなどのサーバ(ワークステーションなど)/クライアント(テレビジョン受像機、パーソナルコンピュータ、スマートフォンなど)は、変調信号を通信で送受信する送信装置A/受信装置Bの一例である(通常、LANにおいては伝送媒体として無線又は有線の何れかが用いられ、WANにおいては伝送媒体として有線が用いられる)。ここで、パーソナルコンピュータには、デスクトップ型PC、ラップトップ型PC、及びタブレット型PCが含まれる。また、スマートフォンには、多機能携帯電話端末も含まれる。
Also, a server (workstation etc.) / Client (television receiver, personal computer, smart phone etc.) such as VOD (Video On Demand) service and video sharing service using the Internet is a transmitting device for transmitting and receiving modulated signals by communication. This is an example of A / reception device B (usually, either wireless or wired is used as a transmission medium in a LAN, and wired is used as a transmission medium in a WAN). Here, the personal computer includes a desktop PC, a laptop PC, and a tablet PC. The smartphone also includes a multi-function mobile phone terminal.
なお、動画共有サービスのクライアントは、サーバからダウンロードした符号化データを復号してディスプレイに表示する機能に加え、カメラで撮像した動画像を符号化してサーバにアップロードする機能を有している。すなわち、動画共有サービスのクライアントは、送信装置A及び受信装置Bの双方として機能する。
In addition to the function of decoding the encoded data downloaded from the server and displaying it on the display, the video sharing service client has a function of encoding a moving image captured by the camera and uploading it to the server. That is, the client of the video sharing service functions as both the transmission device A and the reception device B.
次に、上述した動画像復号装置1および動画像符号化装置2を、動画像の記録及び再生に利用できることを、図31を参照して説明する。
Next, the fact that the above-described moving picture decoding apparatus 1 and moving picture encoding apparatus 2 can be used for recording and reproduction of moving pictures will be described with reference to FIG.
図31(a)は、上述した動画像復号装置1を搭載した記録装置Cの構成を示したブロック図である。図31(a)に示すように、記録装置Cは、動画像を符号化することによって符号化データを得る符号化部C1と、符号化部C1が得た符号化データを記録媒体Mに書き込む書込部C2と、を備えている。上述した動画像符号化装置2は、この符号化部C1として利用される。
FIG. 31A is a block diagram showing a configuration of a recording apparatus C equipped with the moving picture decoding apparatus 1 described above. As shown in FIG. 31 (a), the recording device C encodes a moving image to obtain encoded data, and writes the encoded data obtained by the encoding unit C1 to the recording medium M. And a writing unit C2. The moving image encoding device 2 described above is used as the encoding unit C1.
なお、記録媒体Mは、(1)HDD(Hard Disk Drive)やSSD(Solid State Drive)などのように、記録装置Cに内蔵されるタイプのものであってもよいし、(2)SDメモリカードやUSB(Universal Serial Bus)フラッシュメモリなどのように、記録装置Cに接続されるタイプのものであってもよいし、(3)DVD(Digital Versatile Disc)やBD(Blu-ray Disk:登録商標)などのように、記録装置Cに内蔵されたドライブ装置(不図示)に装填されるものであってもよい。
The recording medium M may be of a type built in the recording device C, such as (1) HDD (Hard Disk Drive) or SSD (Solid State Drive), or (2) SD memory. It may be of the type connected to the recording device C, such as a card or USB (Universal Serial Bus) flash memory, or (3) DVD (Digital Versatile Disc) or BD (Blu-ray Disk: registration) (Trademark) or the like may be mounted on a drive device (not shown) built in the recording apparatus C.
また、記録装置Cは、符号化部C1に入力する動画像の供給源として、動画像を撮像するカメラC3、動画像を外部から入力するための入力端子C4、及び、動画像を受信するための受信部C5を更に備えていてもよい。図31(a)においては、これら全てを記録装置Cが備えた構成を例示しているが、一部を省略しても構わない。
In addition, the recording apparatus C receives a moving image as a supply source of the moving image input to the encoding unit C1, a camera C3 that captures the moving image, an input terminal C4 for inputting the moving image from the outside, and the moving image. The receiving section C5 may be further provided. FIG. 31A illustrates a configuration in which the recording apparatus C includes all of these, but some of them may be omitted.
なお、受信部C5は、符号化されていない動画像を受信するものであってもよいし、記録用の符号化方式とは異なる伝送用の符号化方式で符号化された符号化データを受信するものであってもよい。後者の場合、受信部C5と符号化部C1との間に、伝送用の符号化方式で符号化された符号化データを復号する伝送用復号部(不図示)を介在させるとよい。
The receiving unit C5 may receive an unencoded moving image, or receives encoded data encoded by a transmission encoding method different from the recording encoding method. You may do. In the latter case, a transmission decoding unit (not shown) that decodes encoded data encoded by the transmission encoding method may be interposed between the reception unit C5 and the encoding unit C1.
このような記録装置Cとしては、例えば、DVDレコーダ、BDレコーダ、HD(Hard Disk)レコーダなどが挙げられる(この場合、入力端子C4又は受信部C5が動画像の主な供給源となる)。また、カムコーダ(この場合、カメラC3が動画像の主な供給源となる)、パーソナルコンピュータ(この場合、受信部C5が動画像の主な供給源となる)、スマートフォン(この場合、カメラC3又は受信部C5が動画像の主な供給源となる)なども、このような記録装置Cの一例である。
Examples of such a recording device C include a DVD recorder, a BD recorder, and an HD (Hard Disk) recorder (in this case, the input terminal C4 or the receiving unit C5 is a main source of moving images). In addition, a camcorder (in this case, the camera C3 is a main source of moving images), a personal computer (in this case, the receiving unit C5 is a main source of moving images), a smartphone (in this case, the camera C3 or The receiving unit C5 is a main source of moving images) is an example of such a recording apparatus C.
図31(b)は、上述した動画像復号装置1を搭載した再生装置Dの構成を示したブロックである。図31(b)に示すように、再生装置Dは、記録媒体Mに書き込まれた符号化データを読み出す読出部D1と、読出部D1が読み出した符号化データを復号することによって動画像を得る復号部D2と、を備えている。上述した動画像復号装置1は、この復号部D2として利用される。
FIG. 31 (b) is a block diagram showing the configuration of the playback device D on which the above-described moving image decoding device 1 is mounted. As shown in FIG. 31 (b), the playback device D obtains a moving image by decoding the read data D1 read by the read unit D1 and the read data read by the read unit D1. And a decoding unit D2. The moving picture decoding apparatus 1 described above is used as the decoding unit D2.
なお、記録媒体Mは、(1)HDDやSSDなどのように、再生装置Dに内蔵されるタイプのものであってもよいし、(2)SDメモリカードやUSBフラッシュメモリなどのように、再生装置Dに接続されるタイプのものであってもよいし、(3)DVDやBDなどのように、再生装置Dに内蔵されたドライブ装置(不図示)に装填されるものであってもよい。
The recording medium M may be of a type built in the playback device D such as (1) HDD or SSD, or (2) such as an SD memory card or USB flash memory. It may be of a type connected to the playback device D, or (3) may be loaded into a drive device (not shown) built in the playback device D, such as DVD or BD. Good.
また、再生装置Dは、復号部D2が出力する動画像の供給先として、動画像を表示するディスプレイD3、動画像を外部に出力するための出力端子D4、及び、動画像を送信する送信部D5を更に備えていてもよい。図31(b)においては、これら全てを再生装置Dが備えた構成を例示しているが、一部を省略しても構わない。
Further, the playback device D has a display D3 for displaying a moving image, an output terminal D4 for outputting the moving image to the outside, and a transmitting unit for transmitting the moving image as a supply destination of the moving image output by the decoding unit D2. D5 may be further provided. FIG. 31B illustrates a configuration in which the playback apparatus D includes all of these, but a part of the configuration may be omitted.
なお、送信部D5は、符号化されていない動画像を送信するものであってもよいし、記録用の符号化方式とは異なる伝送用の符号化方式で符号化された符号化データを送信するものであってもよい。後者の場合、復号部D2と送信部D5との間に、動画像を伝送用の符号化方式で符号化する符号化部(不図示)を介在させるとよい。
The transmission unit D5 may transmit a non-encoded moving image, or transmits encoded data encoded by a transmission encoding method different from the recording encoding method. You may do. In the latter case, an encoding unit (not shown) that encodes a moving image with a transmission encoding method may be interposed between the decoding unit D2 and the transmission unit D5.
このような再生装置Dとしては、例えば、DVDプレイヤ、BDプレイヤ、HDDプレイヤなどが挙げられる(この場合、テレビジョン受像機等が接続される出力端子D4が動画像の主な供給先となる)。また、テレビジョン受像機(この場合、ディスプレイD3が動画像の主な供給先となる)、デスクトップ型PC(この場合、出力端子D4又は送信部D5が動画像の主な供給先となる)、ラップトップ型又はタブレット型PC(この場合、ディスプレイD3又は送信部D5が動画像の主な供給先となる)、スマートフォン(この場合、ディスプレイD3又は送信部D5が動画像の主な供給先となる)なども、このような再生装置Dの一例である。
Examples of such a playback device D include a DVD player, a BD player, and an HDD player (in this case, an output terminal D4 to which a television receiver or the like is connected is a main moving image supply destination). . In addition, a television receiver (in this case, the display D3 is a main destination of moving images), a desktop PC (in this case, the output terminal D4 or the transmission unit D5 is a main destination of moving images), A laptop or tablet PC (in this case, the display D3 or the transmission unit D5 is a main destination of moving images), a smartphone (in this case, the display D3 or the transmission unit D5 is a main destination of moving images) ) Is an example of such a reproducing apparatus D.
以上のように、本発明の動画像符号化装置1は、動画像の各フレーム画像を複数の領域に分割する距離画像分割処理部22と、距離画像分割処理部22が分割した各領域の代表値を決定する番号付与部24と、番号付与部24が決定した代表値を所定の順序で並べた数列を、数列パターンと符号語とを対応付けた適応用コードブックを適応的に更新して符号化する適応的符号化と、番号付与部24が決定した代表値を所定に順序で並べ、各代表値に、上記フレーム画像における該代表値の出現率によってビット数が異なる符号語を割り当てて符号化する静的符号化との、何れかを選択し、選択した符号化方式を用いてフレーム画像の符号化データを生成する距離値符号化部25と、を備えている。
As described above, the moving image coding apparatus 1 according to the present invention includes the distance image division processing unit 22 that divides each frame image of a moving image into a plurality of regions, and the representative of each region divided by the distance image division processing unit 22. A number assigning unit 24 for determining a value, and an adaptive codebook in which a sequence obtained by associating a number sequence in which the representative values determined by the number assigning unit 24 are arranged in a predetermined order is associated with a code sequence and a codeword Adaptive coding to be encoded and representative values determined by the number assigning unit 24 are arranged in a predetermined order, and code words having different numbers of bits depending on the appearance rate of the representative values in the frame image are assigned to the representative values. A distance value encoding unit 25 that selects any one of the static encoding to encode and generates encoded data of the frame image using the selected encoding method.
以上のように、本発明に係る動画像符号化装置は、動画像を符号化する動画像符号化装置であって、上記動画像の各フレーム画像を複数の領域に分割する画像分割手段と、上記画像分割手段が分割した各領域の代表値を決定する代表値決定手段と、上記フレーム画像ごとに、上記代表値決定手段が決定した代表値を所定の順序で並べた数列を、数列パターンと符号語とを対応付けた適応用コードブックを適応的に更新して符号化する適応的符号化と、上記フレーム画像ごとに、上記代表値決定手段が決定した代表値を所定に順序で並べ、各代表値に、上記フレーム画像における該代表値の出現率によってビット数が異なる符号語を割り当てて符号化する静的符号化との、少なくとも何れか一方を行い、符号化データを生成する符号化手段と、上記フレーム画像ごとに、上記適応的符号化と上記静的符号化との何れかを選択する符号化方式選択手段と、を備え、上記符号化手段は、上記符号化方式選択手段が選択した符号化方式を用いて上記フレーム画像の符号化を行い、符号化データを生成することを特徴としている。
As described above, the moving image encoding apparatus according to the present invention is a moving image encoding apparatus that encodes a moving image, and an image dividing unit that divides each frame image of the moving image into a plurality of regions; A representative value determining means for determining a representative value of each area divided by the image dividing means, and a sequence of representative values determined by the representative value determining means for each frame image, in a predetermined order, as a sequence pattern Adaptive coding for adaptively updating and coding an adaptive codebook associated with a codeword, and for each frame image, the representative values determined by the representative value determining means are arranged in a predetermined order, Encoding to generate encoded data by performing at least one of static encoding in which each representative value is encoded by allocating a codeword having a different number of bits depending on the appearance rate of the representative value in the frame image. Means and Coding method selection means for selecting either the adaptive coding or the static coding for each frame image, and the coding means selects the code selected by the coding method selection means. The frame image is encoded using an encoding method to generate encoded data.
また、本発明に係る動画像符号化装置の制御方法は、動画像を符号化する動画像符号化装置の制御方法であって、上記動画像符号化装置にて、上記動画像の各フレーム画像を複数の領域に分割する画像分割ステップと、上記画像分割ステップで分割した各領域の代表値を決定する代表値決定ステップと、上記代表値決定ステップで決定した代表値を所定の順序で並べた数列を、数列パターンと符号語とを対応付けた適応用コードブックを適応的に更新して符号化する適応的符号化と、上記代表値決定ステップで決定した代表値を所定に順序で並べ、各代表値に、上記フレーム画像における該代表値の出現率によってビット数が異なる符号語を割り当てて符号化する静的符号化との、何れかを上記フレーム画像ごとに選択する符号化方式選択ステップと、上記適応的符号化と上記静的符号化との、少なくとも何れか一方を行うものであって、上記符号化方式選択ステップで選択した符号化方式を用いて上記フレーム画像の符号化を行い、符号化データを生成する符号化ステップとを含むことを特徴としている。
Also, the control method of the moving image encoding device according to the present invention is a control method of the moving image encoding device for encoding a moving image, and each frame image of the moving image is encoded by the moving image encoding device. An image dividing step for dividing the image into a plurality of regions, a representative value determining step for determining a representative value of each region divided in the image dividing step, and a representative value determined in the representative value determining step are arranged in a predetermined order. The sequence is adaptively encoded by adaptively updating and encoding an adaptive codebook in which a sequence pattern and a code word are associated, and the representative values determined in the representative value determination step are arranged in a predetermined order, Coding method selection for selecting each of the frame images for each of the frame images, in which static coding is performed by allocating code words having different numbers of bits depending on the appearance rate of the representative value in the frame image. And at least one of the adaptive encoding and the static encoding, and the frame image is encoded using the encoding method selected in the encoding method selection step. And an encoding step for generating encoded data.
上記の構成または方法によれば、動画像の各フレーム画像が複数の領域に分割され、分割された各領域の代表値が決定される。そして、フレーム画像ごとに、代表位置を所定の順序で並べ、これを適応的符号化または静的符号化のいずれで符号化するかを選択する。そして、フレーム画像ごとに、選択した符号化方式で符号化を行う。
According to the above configuration or method, each frame image of a moving image is divided into a plurality of regions, and a representative value of each divided region is determined. Then, for each frame image, the representative positions are arranged in a predetermined order, and it is selected whether the encoding is performed by adaptive encoding or static encoding. For each frame image, encoding is performed using the selected encoding method.
ここで、所定の順序とは、代表値と対応する領域がフレーム画像においてどの位置に存在するかを特定することができる順序である。例えば、フレーム画像をラスタスキャンしたときに各領域に含まれる何れかの画素が最初にスキャンされた順を、所定の順序とすることが挙げられる。
Here, the predetermined order is an order in which the position corresponding to the representative value can be specified in the frame image. For example, the order in which any pixel included in each region is first scanned when the frame image is raster scanned can be set as a predetermined order.
これにより、符号化を行うときに、フレーム画像ごとに、適応的符号化と静的符号化との何れかを選択することができ、フレーム画像ごとに、より好ましい符号化方式で符号化を行うことができる。
As a result, when encoding is performed, either adaptive encoding or static encoding can be selected for each frame image, and encoding is performed with a more preferable encoding method for each frame image. be able to.
例えば、符号化後の情報量が少ない方の符号化方式を選択すれば、より圧縮された符号化データを生成することができる。また、処理手順の少ない符号化方式を選択すれば、より効率が良い符号化を行うことができる。
For example, if the encoding method with the smaller amount of information after encoding is selected, more compressed encoded data can be generated. Further, if an encoding method with few processing procedures is selected, more efficient encoding can be performed.
本発明に係る動画像符号化装置では、上記適応的符号化が行われるときに、該適応的符号化対象のフレーム画像における各代表値の出現率を算出し、算出した出現率によって、各代表値に割り当てる符号語のビット数を決定し、代表値と決定したビット数の符号語とを対応付けた静的コードブックを作成する静的コードブック作成手段を備え、上記符号化方式選択手段が静的符号化を選択した場合、上記符号化手段は、以前のフレーム画像を適応的符号化したときに、上記静的コードブック作成手段が作成した静的コードブックを用いて、符号化対象のフレーム画像を静的符号化するものであってもよい。
In the video encoding device according to the present invention, when the adaptive encoding is performed, the appearance rate of each representative value in the frame image to be adaptively encoded is calculated, and each representative value is calculated based on the calculated appearance rate. A static codebook creating means for determining the number of bits of a codeword to be assigned to a value and creating a static codebook in which a representative value and a codeword having the determined number of bits are associated with each other; When static coding is selected, the coding means uses the static codebook created by the static codebook creating means when adaptively coding the previous frame image, and The frame image may be statically encoded.
上記の構成によれば、静的符号化を行うときに、以前に作成された静的コードブックを用いることができる。よって、新たに静的コードブックを作成する処理を行う必要がなくなり、静的符号化の処理の効率を向上させることができる。
According to the above configuration, a static codebook created previously can be used when performing static encoding. Therefore, it is not necessary to newly perform a process for creating a static codebook, and the efficiency of the static encoding process can be improved.
本発明に係る動画像符号化装置では、上記符号化方式選択手段は、適応的符号化と静的符号化とのうち、符号化対象のフレーム画像の符号化後の情報量が小さくなる符号化方式を選択するものであってもよい。
In the moving picture coding apparatus according to the present invention, the coding method selection means performs coding in which an amount of information after coding of a frame image to be coded is reduced, between adaptive coding and static coding. A method may be selected.
上記の構成によれば、符号化後の符号化データの情報量が小さくなる符号化方式で符号化を行う。よって、より圧縮率の高い符号化方式で符号化を行うことができる。
According to the above configuration, encoding is performed using an encoding method in which the information amount of encoded data after encoding is small. Therefore, encoding can be performed with an encoding method having a higher compression rate.
本発明に係る動画像符号化装置では、上記静的コードブック作成手段は、フレーム画像が適応的符号化されるごとに、静的コードブックを作成しており、上記符号化方式選択手段が静的符号化を選択した場合、上記符号化手段は、上記静的コードブック作成手段が作成した複数の静的コードブックのうち、符号化対象のフレーム画像の符号化後の情報量が最も小さくなる静的コードブックを用いて、符号化対象のフレーム画像の静的符号化を行うものであってもよい。
In the moving picture coding apparatus according to the present invention, the static code book creating means creates a static code book every time a frame image is adaptively coded, and the coding method selecting means is static. When the static coding is selected, the coding means has the smallest amount of information after coding the frame image to be coded among the plurality of static codebooks created by the static codebook creating means. A static codebook may be used to perform static encoding of a frame image to be encoded.
上記の構成によれば、複数の静的コードブックのうち、符号化後の符号化データの情報量が最も小さくなるコードブックを用いて、静的符号化を行うことができる。よって、より圧縮率の高い符号化を行うことができる。
According to the above configuration, static coding can be performed using a code book having the smallest information amount of coded data after coding among a plurality of static code books. Therefore, encoding with a higher compression rate can be performed.
本発明に係る動画像符号化装置では、上記静的コードブック作成手段は作成した静的コードブックを保持しており、保持している静的コードブックが所定の数を超えると、古い順に破棄するものであってもよい。
In the moving picture coding apparatus according to the present invention, the static code book creating means holds the created static code book, and when the number of held static code books exceeds a predetermined number, the static code book is discarded in the oldest order. You may do.
上記の構成によれば、保持している静的コードブックが所定の数を超えると、古い順に破棄する。これにより、保持している静的コードブックが、限りなく増えてしまうこと防止することができ、記憶容量が圧迫されることを防止することができる。
According to the above configuration, when the number of retained static codebooks exceeds a predetermined number, the oldest codebooks are discarded in the oldest order. As a result, it is possible to prevent the number of static codebooks held from increasing as much as possible, and to prevent the storage capacity from being compressed.
本発明に係る動画像符号化装置では、上記動画像は、複数視点の動画像であり、上記符号化手段は、上記静的コードブック作成手段が、異なる視点のフレーム画像が適応的符号化されたときに作成した複数の静的コードブックのうち、符号化対象のフレーム画像の符号化後の情報量が最も小さくなる静的コードブックを用いて、符号化対象のフレーム画像の静的符号化を行うものであってもよい。
In the moving image encoding apparatus according to the present invention, the moving image is a moving image of a plurality of viewpoints, and the encoding means is configured such that the static codebook generating means adaptively encodes frame images of different viewpoints. Among the plurality of static codebooks created at the time, using the static codebook that minimizes the amount of information after the encoding of the encoding target frame image, and statically encoding the encoding target frame image It may be what performs.
上記の構成によれば、複数の視点のフレーム画像と対応する静的コードブックが存在するときに、符号化後の情報量が最も小さくなる静的コードブックを用いて静的符号化を行う。よって、より圧縮率の高い符号化を行うことができる。
According to the above configuration, when there is a static codebook corresponding to a plurality of viewpoint frame images, static encoding is performed using the static codebook that minimizes the amount of information after encoding. Therefore, encoding with a higher compression rate can be performed.
本発明に係る動画像符号化装置では、上記符号化手段は、静的符号化を行うときに、符号化対象のフレーム画像の各代表値に対応する符号語が割り当てられていない静的コードブック以外の静的コードブックを用いるものであってもよい。
In the moving picture coding apparatus according to the present invention, the coding means includes a static codebook to which a codeword corresponding to each representative value of a frame image to be coded is not assigned when performing static coding. Other static codebooks may be used.
上記の構成によれば、静的符号化に用いることができる静的コードブックを用いて、符号化を行うことができる。
According to the above configuration, encoding can be performed using a static codebook that can be used for static encoding.
本発明に係る動画像符号化装置では、上記代表値は所定の範囲内に含まれる数値であり、上記静的コードブック作成手段は、上記所定の範囲に含まれる各数値のうち、静的コードブックを作成する対象のフレーム画像における各代表値とは異なる数値についても符号語を割り当てた静的コードブックを作成するものであってもよい。
In the moving image encoding apparatus according to the present invention, the representative value is a numerical value included in a predetermined range, and the static codebook creating means includes a static code among the numerical values included in the predetermined range. A static code book in which codewords are assigned to numerical values different from the representative values in the frame image to be created may be created.
上記の構成によれば、代表値ではない数値であっても、該数値に符号語を対応させた静的コードブックを作成する。よって、対応した符号語がないために、静的符号化を行うことができないということを防止することができる。
According to the above configuration, a static code book is created in which a code word is associated with a numerical value that is not a representative value. Therefore, it can be prevented that static coding cannot be performed because there is no corresponding code word.
また、本発明に係る動画像復号装置は、動画像の各フレーム画像を複数の領域に分割し、各領域の代表値を所定の順序で並べた数列に対し、数列パターンと符号語とを対応付けたコードブックを適応的に更新して符号化する適応的符号化か、または上記代表値を所定に順序で並べ、各代表値に、上記フレーム画像における該代表値の出現率によってビット数が異なる符号語を割り当てて符号化する静的符号化の何れかで符号化されたデータである画像符号化データを復号する動画像復号装置であって、上記画像符号化データおよび、該画像符号化データの符号化方式を示す情報である符号化情報を取得する取得手段と、上記取得手段が取得した符号化情報が示す符号化方式と対応する復号方式で、上記フレーム画像に対応する上記画像符号化データごとに、該画像符号化データを復号して復号データを生成する復号手段と、上記復号手段が生成した復号データと、上記領域を示す情報とから、上記動画像の各フレーム画像を生成する画像生成手段と、を備えていることを特徴としている。
The moving picture decoding apparatus according to the present invention divides each frame image of a moving picture into a plurality of areas, and associates a sequence pattern and a code word with a number sequence in which representative values of each area are arranged in a predetermined order. Adaptive coding that adaptively updates and encodes the attached codebook, or the representative values are arranged in a predetermined order, and each representative value has a bit number depending on the appearance rate of the representative value in the frame image. A moving image decoding apparatus for decoding image encoded data, which is data encoded by any one of static encoding for allocating and encoding different codewords, the image encoded data and the image encoding The image code corresponding to the frame image in an acquisition unit that acquires encoding information that is information indicating a data encoding method and a decoding method corresponding to the encoding method indicated by the encoding information acquired by the acquisition unit For each data, each frame image of the moving image is generated from decoding means for decoding the encoded image data to generate decoded data, decoded data generated by the decoding means, and information indicating the region. And an image generation means.
また、本発明に係る動画像復号装置の制御方法は、動画像の各フレーム画像を複数の領域に分割し、各領域の代表値を所定の順序で並べた数列に対し、数列パターンと符号語とを対応付けたコードブックを適応的に更新して符号化する適応的符号化か、または上記代表値を所定に順序で並べ、各代表値に、上記フレーム画像における該代表値の出現率によってビット数が異なる符号語を割り当てて符号化する静的符号化の何れかで符号化されたデータである画像符号化データを復号する動画像復号装置の制御方法であって、上記動画像復号装置にて、上記画像符号化データおよび、該画像符号化データの符号化方式を示す情報である符号化情報を取得する取得ステップと、上記取得ステップで取得した符号化情報が示す符号化方式と対応する復号方式で、上記フレーム画像に対応する上記画像符号化データごとに、該画像符号化データを復号して復号データを生成する復号ステップと、上記復号ステップで生成した復号データと、上記領域を示す情報とから、上記動画像の各フレーム画像を生成する画像生成ステップと、を含むことを特徴としている。
In addition, the control method of the moving picture decoding apparatus according to the present invention divides each frame image of a moving picture into a plurality of areas, and a number sequence pattern and codeword for a number sequence in which representative values of each area are arranged in a predetermined order. Or adaptive coding that adaptively updates and encodes the codebook, or the representative values are arranged in a predetermined order, and each representative value is represented by the appearance rate of the representative value in the frame image. A method for controlling a moving picture decoding apparatus that decodes picture encoded data that is data encoded by any one of static encodings in which codewords having different bit numbers are allocated and encoded, the moving picture decoding apparatus The acquisition step of acquiring the encoded image data and the encoded information that is information indicating the encoding method of the encoded image data, and the encoding method indicated by the encoded information acquired in the acquiring step You In the decoding method, for each of the image encoded data corresponding to the frame image, a decoding step for decoding the image encoded data to generate decoded data, decoded data generated in the decoding step, and the region And an image generation step of generating each frame image of the moving image from the information.
上記の構成、または方法によれば、動画像の各フレーム画像を複数の領域に分割し、各領域の代表値を所定の順序で並べた数列に対し、数列パターンと符号語とを対応付けたコードブックを適応的に更新して符号化する適応的符号化か、または上記代表値を所定に順序で並べ、各代表値に、上記フレーム画像における該代表値の出現率によってビット数が異なる符号語を割り当てて符号化する静的符号化の何れかで符号化されたデータである画像符号化データを復号する。そして、復号した復号データと、領域を示す情報とから、画像を生成する。
According to the configuration or method described above, each frame image of a moving image is divided into a plurality of regions, and a sequence pattern and a code word are associated with a sequence of numbers in which representative values of each region are arranged in a predetermined order. Adaptive coding for adaptively updating and coding a codebook, or a code in which the representative values are arranged in a predetermined order, and the number of bits varies depending on the appearance rate of the representative value in the frame image. Image encoded data, which is data encoded by any one of static encoding in which words are allocated and encoded, is decoded. Then, an image is generated from the decoded data and the information indicating the area.
これにより、フレーム画像に対応する画像符号化データごとに、適応的符号化された画像符号化データについては適応的復号を行い、静的符号化された画像符号化データについては静的復号を行うというように、適切に復号を行うことができる。
As a result, for each image encoded data corresponding to the frame image, adaptive decoding is performed for the adaptively encoded image encoded data, and static decoding is performed for the statically encoded image encoded data. As described above, decoding can be performed appropriately.
本発明に係る動画像復号装置では、上記復号手段が、適応的符号化と対応する復号方式である適応的復号を行ったときに生成した復号データから、上記各代表値の出現率を算出し、算出した出現率によって、各代表値に割り当てる符号語のビット数を決定し、代表値と決定したビット数の符号語とを対応付けた静的コードブックを作成する静的コードブック作成手段を備え、上記復号手段は、上記静的コードブック作成手段が、以前の画像符号化データを上記復号手段が適応的復号したときに生成した復号データから作成した静的コードブックを用いて、復号対象の画像符号化データを、静的符号化と対応する静的復号するものであってもよい。
In the video decoding device according to the present invention, the decoding means calculates the appearance rate of each representative value from the decoded data generated when adaptive decoding, which is a decoding method corresponding to adaptive encoding, is performed. Static codebook creation means for determining the number of bits of a codeword to be assigned to each representative value according to the calculated appearance rate and creating a static codebook in which the representative value and the codeword of the determined number of bits are associated with each other And the decoding means uses the static codebook created from the decoded data generated when the static codebook creating means adaptively decodes the previous encoded image data. The image encoded data may be statically decoded corresponding to the static encoding.
上記の構成によれば、静的復号を行うときに、以前に作成された静的コードブックを用いることができる。よって、新たに静的コードブックを作成する処理を行う必要がなくなり、静的復号の処理の効率を向上させることができる。
According to the above configuration, a static codebook created previously can be used when performing static decoding. Therefore, it is not necessary to newly perform a process for creating a static codebook, and the efficiency of the static decoding process can be improved.
本発明に係る動画像復号装置では、上記画像符号化データが静的符号化されていた場合、上記取得手段は、該画像符号化データが静的符号化されるときに用いられたコードブックが作成されたフレーム画像を示す画像特定情報を取得し、上記復号手段は、上記画像特定情報が示すフレーム画像の画像符号化データを適応的復号したときに上記静的コードブック作成手段が作成した静的コードブックを用いて、上記の静的符号化された画像符号化データの静的復号を行うものであってもよい。
In the moving image decoding apparatus according to the present invention, when the image encoded data is statically encoded, the acquisition means includes a codebook used when the image encoded data is statically encoded. The image specifying information indicating the generated frame image is acquired, and the decoding means generates the static codebook generating means generated when the image encoded data of the frame image indicated by the image specifying information is adaptively decoded. A static code book may be used to perform static decoding of the statically encoded image encoded data.
上記の構成によれば、静的符号化されたときに用いられた静的コードブックが作成されたフレーム画像を特定することができるので、当該フレーム画像の画像符号化データを適応的復号するときに作成した静的コードブックを用いて静的復号を行うことができる。
According to the above configuration, it is possible to identify the frame image in which the static codebook used when statically encoded is created, so when adaptively decoding the encoded image data of the frame image Static decoding can be performed using the static codebook created in (1).
よって、静的復号に用いる静的コードブックを適切に選択して、復号することができる。
Therefore, it is possible to appropriately select and decode a static codebook used for static decoding.
本発明に係る動画像復号装置では、上記静的コードブック作成手段は作成した静的コードブックを保持しており、保持している静的コードブックが所定の数を超えると、古い順に破棄するものであってもよい。
In the moving picture decoding apparatus according to the present invention, the static codebook creation means holds the created static codebook, and when the number of held static codebooks exceeds a predetermined number, the static codebook creation means discards them in the oldest order. It may be a thing.
上記の構成によれば、保持している静的コードブックが所定の数を超えると、古い順に破棄する。これにより、保持している静的コードブックが、限りなく増えてしまうこと防止することができ、記憶容量が圧迫されることを防止することができる。
According to the above configuration, when the number of retained static codebooks exceeds a predetermined number, the oldest codebooks are discarded in the oldest order. As a result, it is possible to prevent the number of static codebooks held from increasing as much as possible, and to prevent the storage capacity from being compressed.
本発明に係る動画像復号装置では、上記代表値は所定の範囲内に含まれる数値であり、上記静的コードブック作成手段は、上記所定の範囲に含まれる各数値のうち、静的コードブックを作成する対象の復号データに含まれない数値についても符号語を割り当てた静的コードブックを作成するものであってもよい。
In the moving picture decoding apparatus according to the present invention, the representative value is a numerical value included in a predetermined range, and the static codebook creating means includes a static codebook among the numerical values included in the predetermined range. A static codebook to which codewords are assigned may be created for numerical values not included in the decoded data to be created.
上記の構成によれば、代表値ではない数値であっても、該数値と符号語とを対応させた静的コードブックを作成する。よって、対応した符号語がないために、静的符号化を行うことができないということを防止することができる。
According to the above configuration, a static codebook is created in which a numerical value that is not a representative value is associated with the codeword. Therefore, it can be prevented that static coding cannot be performed because there is no corresponding code word.
上記動画像符号化装置と、上記動画像復号装置とを含む動画像伝送システムは、上述した効果を奏することができる。
The moving image transmission system including the moving image encoding device and the moving image decoding device can achieve the effects described above.
なお、上記動画像符号化装置および動画像復号装置は、コンピュータによって実現してもよく、この場合には、コンピュータを上記各手段として動作させることにより上記動画像符号化装置および動画像復号装置をコンピュータにて実現させる動画像符号化装置および動画像復号装置の制御プログラム、およびそれを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。
Note that the moving image encoding device and the moving image decoding device may be realized by a computer. In this case, the moving image encoding device and the moving image decoding device are operated by causing the computer to operate as the respective means. A video encoding device and a video decoding device control program realized by a computer and a computer-readable recording medium on which the control program is recorded also fall within the scope of the present invention.
(ソフトウェアによる構成)
最後に、動画像符号化装置1(1A)、動画像復号装置2(2A)の各ブロック、特に画像符号化部11(11A)、画像復号部12、距離画像符号化部20(20A)(画像分割処理部21(21´)、距離画像分割処理部22、距離値修正部23、番号付与部24、距離値符号化部25(25A))、距離値復号部32、距離値付与部33は、集積回路(ICチップ)上に形成された論理回路によってハードウェア的に実現していてもよいし、CPU(central processing unit)を用いてソフトウェア的に実現してもよい。 (Configuration by software)
Finally, each block of the moving image encoding device 1 (1A) and the moving image decoding device 2 (2A), particularly the image encoding unit 11 (11A), theimage decoding unit 12, and the distance image encoding unit 20 (20A) ( Image division processing unit 21 (21 ′), distance image division processing unit 22, distance value correction unit 23, number assigning unit 24, distance value encoding unit 25 (25A)), distance value decoding unit 32, distance value providing unit 33 May be realized in hardware by a logic circuit formed on an integrated circuit (IC chip), or may be realized in software using a CPU (central processing unit).
最後に、動画像符号化装置1(1A)、動画像復号装置2(2A)の各ブロック、特に画像符号化部11(11A)、画像復号部12、距離画像符号化部20(20A)(画像分割処理部21(21´)、距離画像分割処理部22、距離値修正部23、番号付与部24、距離値符号化部25(25A))、距離値復号部32、距離値付与部33は、集積回路(ICチップ)上に形成された論理回路によってハードウェア的に実現していてもよいし、CPU(central processing unit)を用いてソフトウェア的に実現してもよい。 (Configuration by software)
Finally, each block of the moving image encoding device 1 (1A) and the moving image decoding device 2 (2A), particularly the image encoding unit 11 (11A), the
後者の場合、動画像符号化装置1(1A)、動画像復号装置2(2A)は、各機能を実現する制御プログラムの命令を実行するCPU、上記プログラムを格納したROM(read only memory)、上記プログラムを展開するRAM(random access memory)、上記プログラムおよび各種データを格納するメモリ等の記憶装置(記録媒体)などを備えている。そして、本発明の目的は、上述した機能を実現するソフトウェアである動画像符号化装置1(1A)、動画像復号装置2(2A)の制御プログラムのプログラムコード(実行形式プログラム、中間コードプログラム、ソースプログラム)をコンピュータで読み取り可能に記録した記録媒体を、上記の動画像符号化装置1(1A)、動画像復号装置2(2A)に供給し、そのコンピュータ(またはCPUやMPU(microprocessor unit))が記録媒体に記録されているプログラムコードを読み出し実行することによっても、達成可能である。
In the latter case, the moving image encoding device 1 (1A) and the moving image decoding device 2 (2A) include a CPU that executes instructions of a control program for realizing each function, a ROM (read only memory) that stores the program, A RAM (random access memory) for expanding the program and a storage device (recording medium) such as a memory for storing the program and various data are provided. The object of the present invention is to provide program codes (execution format program, intermediate code program, control code) of the video encoding device 1 (1A) and video decoding device 2 (2A) that are software for realizing the functions described above. A recording medium in which a source program is recorded so as to be readable by a computer is supplied to the moving image encoding device 1 (1A) and the moving image decoding device 2 (2A), and the computer (or CPU or MPU (microprocessor unit)) ) Can also be achieved by reading and executing the program code recorded on the recording medium.
上記記録媒体としては、例えば、磁気テープやカセットテープ等のテープ類、フロッピー(登録商標)ディスク/ハードディスク等の磁気ディスクやCD-ROM(compact disc read-only memory)/MO(magneto-optical)/MD(Mini Disc)/DVD(digital versatile disk)/CD-R(CD Recordable)等の光ディスクを含むディスク類、ICカード(メモリカードを含む)/光カード等のカード類、マスクROM/EPROM(erasable programmable read-only memory)/EEPROM(electrically erasable and programmable read-only memory)/フラッシュROM等の半導体メモリ類、あるいはPLD(Programmable logic device)やFPGA(Field Programmable Gate Array)等の論理回路類などを用いることができる。
Examples of the recording medium include tapes such as a magnetic tape and a cassette tape, a magnetic disk such as a floppy (registered trademark) disk / hard disk, a CD-ROM (compact disk-read-only memory) / MO (magneto-optical) / Disks including optical disks such as MD (Mini Disc) / DVD (digital versatile disk) / CD-R (CD Recordable), cards such as IC cards (including memory cards) / optical cards, mask ROM / EPROM (erasable Programmable read-only memory) / EEPROM (electrically erasable and programmable read-only memory) / semiconductor memory such as flash ROM, or logic circuits such as PLD (Programmable logic device) and FPGA (Field Programmable Gate Array) be able to.
また、動画像符号化装置1(1A)、動画像復号装置2(2A)を通信ネットワークと接続可能に構成し、上記プログラムコードを通信ネットワークを介して供給してもよい。この通信ネットワークは、プログラムコードを伝送可能であればよく、特に限定されない。例えば、インターネット、イントラネット、エキストラネット、LAN(local area network)、ISDN(integrated services digital network)、VAN(value-added network)、CATV(community antenna television)通信網、仮想専用網(virtual private network)、電話回線網、移動体通信網、衛星通信網等が利用可能である。また、この通信ネットワークを構成する伝送媒体も、プログラムコードを伝送可能な媒体であればよく、特定の構成または種類のものに限定されない。例えば、IEEE(institute of electrical and electronic engineers)1394、USB、電力線搬送、ケーブルTV回線、電話線、ADSL(asynchronous digital subscriber loop)回線等の有線でも、IrDA(infrared data association)やリモコンのような赤外線、Bluetooth(登録商標)、IEEE802.11無線、HDR(high data rate)、NFC(Near Field Communication)、DLNA(Digital Living Network Alliance)、携帯電話網、衛星回線、地上波デジタル網等の無線でも利用可能である。なお、本発明は、上記プログラムコードが電子的な伝送で具現化された、搬送波に埋め込まれたコンピュータデータ信号の形態でも実現され得る。
Further, the moving image encoding device 1 (1A) and the moving image decoding device 2 (2A) may be configured to be connectable to a communication network, and the program code may be supplied via the communication network. The communication network is not particularly limited as long as it can transmit the program code. For example, the Internet, intranet, extranet, LAN (local area network), ISDN (integrated service areas digital network), VAN (value-added network), CATV (community antenna network) communication network, virtual private network (virtual private network), A telephone line network, a mobile communication network, a satellite communication network, etc. can be used. The transmission medium constituting the communication network may be any medium that can transmit the program code, and is not limited to a specific configuration or type. For example, IEEE (institute of electrical and electronic engineers) 1394, USB, power line carrier, cable TV line, telephone line, ADSL (asynchronous digital subscriber loop) line, etc. wired such as IrDA (infrared data association) or remote control , Bluetooth (registered trademark), IEEE 802.11 wireless, HDR (high data rate), NFC (Near Field Communication), DLNA (Digital Living Network Alliance), mobile phone network, satellite line, terrestrial digital network, etc. Is possible. The present invention can also be realized in the form of a computer data signal embedded in a carrier wave in which the program code is embodied by electronic transmission.
本発明は、3D対応のコンテンツを生成するコンテンツ生成装置や3D対応のコンテンツを再生するコンテンツ再生装置等に好適に適用することができる。
The present invention can be suitably applied to a content generation device that generates 3D-compatible content, a content playback device that plays back 3D-compatible content, and the like.
1 動画像符号化装置(動画像符号化装置)
2 動画像復号装置(動画像復号装置)
11、11A 画像符号化部
12 画像復号部
22 距離画像分割処理部(画像分割手段)
24 番号付与部(代表値決定手段)
25、25A 距離値符号化部(符号化方式選択手段、符号化手段、静的コードブック作成手段)
31 アンパッケージング部(取得手段)
32 距離値復号部(復号手段、静的コードブック作成手段)
33 距離値付与部(画像生成手段) 1 video encoding device (video encoding device)
2 Video decoding device (video decoding device)
DESCRIPTION OF SYMBOLS 11, 11A Image encoding part 12 Image decoding part 22 Distance image division process part (image division means)
24 Numbering unit (representative value determining means)
25, 25A Distance value encoding unit (encoding method selection means, encoding means, static codebook creation means)
31 Unpacking part (acquisition means)
32 Distance value decoding unit (decoding means, static codebook creation means)
33 Distance value assigning unit (image generating means)
2 動画像復号装置(動画像復号装置)
11、11A 画像符号化部
12 画像復号部
22 距離画像分割処理部(画像分割手段)
24 番号付与部(代表値決定手段)
25、25A 距離値符号化部(符号化方式選択手段、符号化手段、静的コードブック作成手段)
31 アンパッケージング部(取得手段)
32 距離値復号部(復号手段、静的コードブック作成手段)
33 距離値付与部(画像生成手段) 1 video encoding device (video encoding device)
2 Video decoding device (video decoding device)
DESCRIPTION OF
24 Numbering unit (representative value determining means)
25, 25A Distance value encoding unit (encoding method selection means, encoding means, static codebook creation means)
31 Unpacking part (acquisition means)
32 Distance value decoding unit (decoding means, static codebook creation means)
33 Distance value assigning unit (image generating means)
Claims (19)
- 動画像を符号化する動画像符号化装置であって、
上記動画像の各フレーム画像を複数の領域に分割する画像分割手段と、
上記画像分割手段が分割した各領域の代表値を決定する代表値決定手段と、
上記フレーム画像ごとに、上記代表値決定手段が決定した代表値を所定の順序で並べた数列を、数列パターンと符号語とを対応付けた適応用コードブックを適応的に更新して符号化する適応的符号化と、
上記フレーム画像ごとに、上記代表値決定手段が決定した代表値を所定に順序で並べ、各代表値に、上記フレーム画像における該代表値の出現率によってビット数が異なる符号語を割り当てて符号化する静的符号化との、少なくとも何れか一方を行い、符号化データを生成する符号化手段と、
上記フレーム画像ごとに、上記適応的符号化と上記静的符号化との何れかを選択する符号化方式選択手段と、を備え、
上記符号化手段は、上記符号化方式選択手段が選択した符号化方式を用いて上記フレーム画像の符号化を行い、符号化データを生成することを特徴とする動画像符号化装置。 A moving image encoding device for encoding a moving image,
Image dividing means for dividing each frame image of the moving image into a plurality of regions;
Representative value determining means for determining a representative value of each area divided by the image dividing means;
For each frame image, a numerical sequence in which the representative values determined by the representative value determining means are arranged in a predetermined order is encoded by adaptively updating an adaptive codebook in which the sequence pattern and the code word are associated with each other. Adaptive coding,
For each frame image, the representative values determined by the representative value determining means are arranged in a predetermined order, and each representative value is encoded by assigning a code word having a different number of bits depending on the appearance rate of the representative value in the frame image. Encoding means for generating encoded data by performing at least one of static encoding and
Coding method selection means for selecting either the adaptive coding or the static coding for each frame image;
The moving picture encoding apparatus, wherein the encoding means encodes the frame image using the encoding system selected by the encoding system selection means and generates encoded data. - 上記適応的符号化が行われるときに、該適応的符号化対象のフレーム画像における各代表値の出現率を算出し、算出した出現率によって、各代表値に割り当てる符号語のビット数を決定し、代表値と決定したビット数の符号語とを対応付けた静的コードブックを作成する静的コードブック作成手段を備え、
上記符号化方式選択手段が静的符号化を選択した場合、上記符号化手段は、以前のフレーム画像を適応的符号化したときに、上記静的コードブック作成手段が作成した静的コードブックを用いて、符号化対象のフレーム画像を静的符号化することを特徴とする請求項1に記載の動画像符号化装置。 When the adaptive encoding is performed, the appearance rate of each representative value in the adaptive encoding target frame image is calculated, and the number of codeword bits to be assigned to each representative value is determined based on the calculated appearance rate. , Comprising a static codebook creating means for creating a static codebook in which a representative value and a codeword of the determined number of bits are associated,
When the encoding method selection unit selects static encoding, the encoding unit selects the static codebook created by the static codebook creation unit when adaptively encoding the previous frame image. The moving image encoding apparatus according to claim 1, wherein the frame image to be encoded is statically encoded. - 上記符号化方式選択手段は、適応的符号化と静的符号化とのうち、符号化対象のフレーム画像の符号化後の情報量が小さくなる符号化方式を選択することを特徴とする請求項2に記載の動画像符号化装置。 The encoding method selecting means selects an encoding method that reduces an amount of information after encoding of a frame image to be encoded, from adaptive encoding and static encoding. 2. The moving image encoding apparatus according to 2.
- 上記静的コードブック作成手段は、フレーム画像が適応的符号化されるごとに、静的コードブックを作成しており、
上記符号化方式選択手段が静的符号化を選択した場合、上記符号化手段は、上記静的コードブック作成手段が作成した複数の静的コードブックのうち、符号化対象のフレーム画像の符号化後の情報量が最も小さくなる静的コードブックを用いて、符号化対象のフレーム画像の静的符号化を行うことを特徴とする請求項2または3に記載の動画像符号化装置。 The static code book creating means creates a static code book every time a frame image is adaptively encoded,
When the encoding method selection unit selects static encoding, the encoding unit encodes a frame image to be encoded among a plurality of static codebooks created by the static codebook creation unit. 4. The moving picture encoding apparatus according to claim 2, wherein the encoding of the frame image to be encoded is performed by using a static codebook with the smallest amount of information later. - 上記静的コードブック作成手段は作成した静的コードブックを保持しており、保持している静的コードブックが所定の数を超えると、古い順に破棄することを特徴とする請求項4に記載の動画像符号化装置。 5. The static code book creating means holds created static code books, and when the number of held static code books exceeds a predetermined number, the static code book creation means discards the oldest code books in the oldest order. Video encoding device.
- 上記動画像は、複数視点の動画像であり、
上記符号化手段は、上記静的コードブック作成手段が、異なる視点のフレーム画像が適応的符号化されたときに作成した複数の静的コードブックのうち、符号化対象のフレーム画像の符号化後の情報量が最も小さくなる静的コードブックを用いて、符号化対象のフレーム画像の静的符号化を行うことを特徴とする請求項4または5に記載の動画像符号化装置。 The moving image is a moving image of a plurality of viewpoints,
The encoding unit is configured to encode a frame image to be encoded among a plurality of static codebooks generated when the static codebook generation unit adaptively encodes frame images of different viewpoints. 6. The moving picture coding apparatus according to claim 4 or 5, wherein static coding is performed on a frame image to be coded using a static codebook with the smallest amount of information. - 上記符号化手段は、静的符号化を行うときに、符号化対象のフレーム画像の各代表値に対応する符号語が割り当てられていない静的コードブック以外の静的コードブックを用いることを特徴とする請求項2~6のいずれか1項に記載の動画像符号化装置。 The encoding means uses a static codebook other than a static codebook to which a codeword corresponding to each representative value of a frame image to be encoded is not assigned when performing static encoding. The moving picture encoding apparatus according to any one of claims 2 to 6.
- 上記代表値は所定の範囲内に含まれる数値であり、
上記静的コードブック作成手段は、上記所定の範囲に含まれる各数値のうち、静的コードブックを作成する対象のフレーム画像における各代表値とは異なる数値についても符号語を割り当てた静的コードブックを作成することを特徴とする請求項2~7のいずれか1項に記載の動画像符号化装置。 The representative value is a numerical value included in a predetermined range,
The static code book creating means is a static code in which codewords are assigned to numerical values that are different from the representative values in the frame image for which a static code book is to be created among the numerical values included in the predetermined range. The moving picture coding apparatus according to any one of claims 2 to 7, wherein a book is created. - 動画像の各フレーム画像を複数の領域に分割し、各領域の代表値を所定の順序で並べた数列に対し、数列パターンと符号語とを対応付けたコードブックを適応的に更新して符号化する適応的符号化か、または上記代表値を所定に順序で並べ、各代表値に、上記フレーム画像における該代表値の出現率によってビット数が異なる符号語を割り当てて符号化する静的符号化の何れかで符号化されたデータである画像符号化データを復号する動画像復号装置であって、
上記画像符号化データおよび、該画像符号化データの符号化方式を示す情報である符号化情報を取得する取得手段と、
上記取得手段が取得した符号化情報が示す符号化方式と対応する復号方式で、上記フレーム画像に対応する上記画像符号化データごとに、該画像符号化データを復号して復号データを生成する復号手段と、
上記復号手段が生成した復号データと、上記領域を示す情報とから、上記動画像の各フレーム画像を生成する画像生成手段と、を備えていることを特徴とする動画像復号装置。 Each frame image of a moving image is divided into a plurality of regions, and a code book in which a sequence pattern and a code word are associated with a number sequence in which representative values of each region are arranged in a predetermined order is adaptively updated and encoded. Adaptive coding to be performed, or static codes in which the representative values are arranged in a predetermined order, and code words having different numbers of bits are assigned to the representative values according to the appearance rate of the representative values in the frame image. A video decoding device that decodes image encoded data that is data encoded by any of the following:
Acquisition means for acquiring the image encoded data and encoded information which is information indicating an encoding method of the image encoded data;
Decoding that decodes the image encoded data and generates decoded data for each of the image encoded data corresponding to the frame image in a decoding method corresponding to the encoding method indicated by the encoding information acquired by the acquisition means Means,
A moving picture decoding apparatus comprising: image generation means for generating each frame image of the moving picture from decoded data generated by the decoding means and information indicating the region. - 上記復号手段が、適応的符号化と対応する復号方式である適応的復号を行ったときに生成した復号データから、上記各代表値の出現率を算出し、算出した出現率によって、各代表値に割り当てる符号語のビット数を決定し、代表値と決定したビット数の符号語とを対応付けた静的コードブックを作成する静的コードブック作成手段を備え、
上記復号手段は、上記静的コードブック作成手段が、以前の画像符号化データを上記復号手段が適応的復号したときに生成した復号データから作成した静的コードブックを用いて、復号対象の画像符号化データを、静的符号化と対応する静的復号することを特徴とする請求項9に記載の動画像復号装置。 The decoding means calculates an appearance rate of each representative value from decoded data generated when adaptive decoding, which is a decoding scheme corresponding to adaptive encoding, and each representative value is calculated based on the calculated appearance rate. A static codebook creating means for determining the number of bits of a codeword to be assigned to and creating a static codebook in which a representative value and a codeword of the determined number of bits are associated,
The decoding means uses the static codebook created from the decoded data generated by the static codebook creating means when the decoding means adaptively decodes the previous image encoded data, and the decoding target image The moving picture decoding apparatus according to claim 9, wherein the encoded data is statically decoded corresponding to the static encoding. - 上記画像符号化データが静的符号化されていた場合、上記取得手段は、該画像符号化データが静的符号化されるときに用いられたコードブックが作成されたフレーム画像を示す画像特定情報を取得し、
上記復号手段は、上記画像特定情報が示すフレーム画像の画像符号化データを適応的復号したときに上記静的コードブック作成手段が作成した静的コードブックを用いて、上記の静的符号化された画像符号化データの静的復号を行うことを特徴とする請求項10に記載の動画像復号装置。 When the image encoded data is statically encoded, the acquisition means includes image specifying information indicating a frame image in which a code book used when the image encoded data is statically encoded is created. Get
The decoding means uses the static codebook created by the static codebook creation means when adaptively decoding the coded image data of the frame image indicated by the image specifying information, and performs the static coding. The moving image decoding apparatus according to claim 10, wherein static decoding of the encoded image data is performed. - 上記静的コードブック作成手段は作成した静的コードブックを保持しており、保持している静的コードブックが所定の数を超えると、古い順に破棄することを特徴とする請求項11に記載の動画像復号装置。 12. The static code book creating means holds a created static code book, and when the number of held static code books exceeds a predetermined number, the static code book creating means discards the oldest code book in the oldest order. Video decoding device.
- 上記代表値は所定の範囲内に含まれる数値であり、
上記静的コードブック作成手段は、上記所定の範囲に含まれる各数値のうち、静的コードブックを作成する対象の復号データに含まれない数値についても符号語を割り当てた静的コードブックを作成することを特徴とする請求項10~12のいずれか1項に記載の動画像復号装置。 The representative value is a numerical value included in a predetermined range,
The static codebook creating means creates a static codebook in which codewords are assigned to numerical values that are not included in the decoded data for which the static codebook is to be created, among the numerical values included in the predetermined range. The moving picture decoding apparatus according to any one of claims 10 to 12, characterized by: - 請求項1~8のいずれか1項に記載の動画像符号化装置と、請求項9~13のいずれか1項に記載の動画像復号装置とを含む動画像伝送システム。 A moving picture transmission system including the moving picture encoding apparatus according to any one of claims 1 to 8 and the moving picture decoding apparatus according to any one of claims 9 to 13.
- 動画像を符号化する動画像符号化装置の制御方法であって、
上記動画像符号化装置にて、
上記動画像の各フレーム画像を複数の領域に分割する画像分割ステップと、
上記画像分割ステップで分割した各領域の代表値を決定する代表値決定ステップと、
上記代表値決定ステップで決定した代表値を所定の順序で並べた数列を、数列パターンと符号語とを対応付けた適応用コードブックを適応的に更新して符号化する適応的符号化と、
上記代表値決定ステップで決定した代表値を所定に順序で並べ、各代表値に、上記フレーム画像における該代表値の出現率によってビット数が異なる符号語を割り当てて符号化する静的符号化との、何れかを上記フレーム画像ごとに選択する符号化方式選択ステップと、
上記適応的符号化と上記静的符号化との、少なくとも何れか一方を行うものであって、上記符号化方式選択ステップで選択した符号化方式を用いて上記フレーム画像の符号化を行い、符号化データを生成する符号化ステップとを含むことを特徴とする動画像符号化装置の制御方法。 A method for controlling a moving image encoding apparatus for encoding a moving image, comprising:
In the above video encoding device,
An image dividing step of dividing each frame image of the moving image into a plurality of regions;
A representative value determining step for determining a representative value of each region divided in the image dividing step;
Adaptive encoding that adaptively updates and encodes an adaptive codebook in which a sequence pattern and a codeword are associated with a number sequence in which representative values determined in the representative value determination step are arranged in a predetermined order;
Static coding in which the representative values determined in the representative value determining step are arranged in a predetermined order, and each representative value is assigned with a codeword having a different number of bits depending on the appearance rate of the representative value in the frame image; An encoding method selection step for selecting any one of the frame images,
It performs at least one of the adaptive encoding and the static encoding, and encodes the frame image using the encoding method selected in the encoding method selection step. A method for controlling a moving picture encoding apparatus, comprising: an encoding step for generating encoded data. - 動画像の各フレーム画像を複数の領域に分割し、各領域の代表値を所定の順序で並べた数列に対し、数列パターンと符号語とを対応付けたコードブックを適応的に更新して符号化する適応的符号化か、または上記代表値を所定に順序で並べ、各代表値に、上記フレーム画像における該代表値の出現率によってビット数が異なる符号語を割り当てて符号化する静的符号化の何れかで符号化されたデータである画像符号化データを復号する動画像復号装置の制御方法であって、
上記動画像復号装置にて、
上記画像符号化データおよび、該画像符号化データの符号化方式を示す情報である符号化情報を取得する取得ステップと、
上記取得ステップで取得した符号化情報が示す符号化方式と対応する復号方式で、上記フレーム画像に対応する上記画像符号化データごとに、該画像符号化データを復号して復号データを生成する復号ステップと、
上記復号ステップで生成した復号データと、上記領域を示す情報とから、上記動画像の各フレーム画像を生成する画像生成ステップと、を含むことを特徴とする動画像復号装置の制御方法。 Each frame image of a moving image is divided into a plurality of regions, and a code book in which a sequence pattern and a code word are associated with a number sequence in which representative values of each region are arranged in a predetermined order is adaptively updated and encoded. Adaptive coding to be performed, or static codes in which the representative values are arranged in a predetermined order, and code words having different numbers of bits are assigned to the representative values according to the appearance rate of the representative values in the frame image. A method of controlling a moving image decoding apparatus for decoding image encoded data which is data encoded by any of the following:
In the video decoding device,
An acquisition step of acquiring the image encoded data and encoded information which is information indicating an encoding method of the image encoded data;
Decoding that decodes the image encoded data and generates decoded data for each of the image encoded data corresponding to the frame image in a decoding method corresponding to the encoding method indicated by the encoding information acquired in the acquisition step Steps,
A method for controlling a moving image decoding apparatus, comprising: an image generation step for generating each frame image of the moving image from the decoded data generated in the decoding step and information indicating the region. - 請求項1~8のいずれか1項に記載の動画像符号化装置を動作させる動画像符号化装置の制御プログラムであって、コンピュータを上記の各手段として機能させるための動画像符号化装置の制御プログラム。 A control program for a moving picture coding apparatus for operating the moving picture coding apparatus according to any one of claims 1 to 8, wherein the moving picture coding apparatus is for causing a computer to function as each of the above means. Control program.
- 請求項9~13のいずれか1項に記載の動画像復号装置を動作させる動画像復号装置の制御プログラムであって、コンピュータを上記の各手段として機能させるための動画像復号装置の制御プログラム。 14. A moving picture decoding apparatus control program for operating the moving picture decoding apparatus according to any one of claims 9 to 13, wherein the moving picture decoding apparatus control program causes a computer to function as each of the means described above.
- 請求項17および18の少なくとも何れか一方に記載の制御プログラムを記録したコンピュータ読み取り可能な記録媒体。 A computer-readable recording medium on which the control program according to at least one of claims 17 and 18 is recorded.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2010-247425 | 2010-11-04 | ||
JP2010247425 | 2010-11-04 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012060172A1 true WO2012060172A1 (en) | 2012-05-10 |
Family
ID=46024293
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2011/072291 WO2012060172A1 (en) | 2010-11-04 | 2011-09-28 | Movie image encoding device, movie image decoding device, movie image transmitting system, method of controlling movie image encoding device, method of controlling movie image decoding device, movie image encoding device controlling program, movie image decoding device controlling program, and recording medium |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2012060172A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113727105A (en) * | 2021-09-08 | 2021-11-30 | 北京医百科技有限公司 | Depth map compression method, device, system and storage medium |
JP2022510733A (en) * | 2019-02-22 | 2022-01-27 | グーグル エルエルシー | Compression of entire multiple images |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08316843A (en) * | 1995-03-14 | 1996-11-29 | Ricoh Co Ltd | Coder |
JPH09289638A (en) * | 1996-04-23 | 1997-11-04 | Nec Corp | Three-dimensional image encoding/decoding system |
WO2004071102A1 (en) * | 2003-01-20 | 2004-08-19 | Sanyo Electric Co,. Ltd. | Three-dimensional video providing method and three-dimensional video display device |
JP2009017430A (en) * | 2007-07-09 | 2009-01-22 | Brother Ind Ltd | Data file transmitting apparatus, program, data file transmission method and data structure |
-
2011
- 2011-09-28 WO PCT/JP2011/072291 patent/WO2012060172A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08316843A (en) * | 1995-03-14 | 1996-11-29 | Ricoh Co Ltd | Coder |
JPH09289638A (en) * | 1996-04-23 | 1997-11-04 | Nec Corp | Three-dimensional image encoding/decoding system |
WO2004071102A1 (en) * | 2003-01-20 | 2004-08-19 | Sanyo Electric Co,. Ltd. | Three-dimensional video providing method and three-dimensional video display device |
JP2009017430A (en) * | 2007-07-09 | 2009-01-22 | Brother Ind Ltd | Data file transmitting apparatus, program, data file transmission method and data structure |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2022510733A (en) * | 2019-02-22 | 2022-01-27 | グーグル エルエルシー | Compression of entire multiple images |
JP7147075B2 (en) | 2019-02-22 | 2022-10-04 | グーグル エルエルシー | Compression across multiple images |
CN113727105A (en) * | 2021-09-08 | 2021-11-30 | 北京医百科技有限公司 | Depth map compression method, device, system and storage medium |
CN113727105B (en) * | 2021-09-08 | 2022-04-26 | 北京医百科技有限公司 | Depth map compression method, device, system and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7200320B2 (en) | Image filter device, filter method and moving image decoding device | |
JP6788699B2 (en) | Effective partition coding with high partitioning degrees of freedom | |
JP6814783B2 (en) | Valid predictions using partition coding | |
JP6441418B2 (en) | Image decoding apparatus, image decoding method, image coding apparatus, and image coding method | |
US10237576B2 (en) | 3D-HEVC depth video information hiding method based on single-depth intra mode | |
CN107431805B (en) | Encoding method and apparatus, and decoding method and apparatus | |
CN103918261A (en) | Signal processing and inheritance in hierarchical signal quality hierarchy | |
WO2013115024A1 (en) | Image processing apparatus and image processing method | |
CN111698519B (en) | Image decoding device and image encoding device | |
WO2013046990A1 (en) | Offset decoding apparatus, offset encoding apparatus, image filter apparatus, and data structure | |
WO2022072242A1 (en) | Coding video data using pose information of a user | |
WO2012060172A1 (en) | Movie image encoding device, movie image decoding device, movie image transmitting system, method of controlling movie image encoding device, method of controlling movie image decoding device, movie image encoding device controlling program, movie image decoding device controlling program, and recording medium | |
CN112565793B (en) | Image lossless compression method based on prediction difference value classification entropy coding | |
US20060278725A1 (en) | Image encoding and decoding method and apparatus, and computer-readable recording medium storing program for executing the method | |
KR101294364B1 (en) | Lossless Image Compression and Decompression Method for High Definition Image and electronic device using the same | |
WO2012060171A1 (en) | Movie image encoding device, movie image decoding device, movie image transmitting system, method of controlling movie image encoding device, method of controlling movie image decoding device, movie image encoding device controlling program, movie image decoding device controlling program, and recording medium | |
US10869030B2 (en) | Method of coding and decoding images, a coding and decoding device, and corresponding computer programs | |
Naaz et al. | Implementation of hybrid algorithm for image compression and decompression | |
JP2014504119A (en) | Method for encoding an image sequence and reconstruction method corresponding to this method | |
KR101844971B1 (en) | Image encoding method based on pixels and image decoding method based on pixels | |
WO2012060168A1 (en) | Encoder apparatus, decoder apparatus, encoding method, decoding method, program, recording medium, and encoded data | |
CN117041573A (en) | Encoding/decoding apparatus and apparatus for transmitting data | |
WO2012060179A1 (en) | Encoder apparatus, decoder apparatus, encoding method, decoding method, program, recording medium, and data structure of encoded data | |
EP4354862A1 (en) | Systems and methods for end-to-end feature compression in coding of multi-dimensional data | |
CN116248895B (en) | Video cloud transcoding method and system for virtual reality panorama roaming |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11837825 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 11837825 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: JP |