WO2024145086A1

WO2024145086A1 - Content derivation for geometric partitioning mode video coding

Info

Publication number: WO2024145086A1
Application number: PCT/US2023/084924
Authority: WO
Inventors: Xiang Li; Jingning Han; Yaowu Xu; Debargha Mukherjee
Original assignee: Google Llc
Priority date: 2022-12-27
Filing date: 2023-12-19
Publication date: 2024-07-04

Abstract

Content derivation for geometric partitioning mode video coding is performed to determine a blending index for adaptive blending of the geometric partitions for a coding unit. The blending index may be signaled based on the derived content type for the subject video data. A content type is determined for video data to encode or decode. A blending index is determined for geometric partitions of a coding unit of the video data based on the content type. Blending is performed against prediction signals of the geometric partitions according to the blending index to produce a prediction unit. The prediction unit is then encoded (e.g., to a bitstream) or decoded (e.g., for storage or display). The content type may be a screen content type or a camera content type, and the subset of blending indices from which the blending index is determined are based on that content type.

Description

CONTENT DERIVATION FOR GEOMETRIC PARTITIONING MODE

VIDEO CODING

BACKGROUND

[0001] Digital video streams may represent video using a sequence of frames or still images. Digital video can be used for various applications including, for example, video conferencing, high definition video entertainment, video advertisements, or sharing of user-generated videos. A digital video stream can contain a large amount of data and consume a significant amount of computing or communication resources of a computing device for processing, transmission, or storage of the video data. Various approaches have been proposed to reduce the amount of data in video streams, including encoding or decoding techniques.

SUMMARY

[0002] Disclosed herein are, inter alia, systems and techniques for content derivation for geometric partitioning mode video coding.

[0003] A method for content derivation for geometric partitioning mode video coding according to an implementation of this disclosure comprises: determining a content type for video data to encode or decode; determining a blending index for a geometric partitions of a coding unit of the video data based on the content type; performing blending against prediction signals of the geometric partitions according to the blending index to produce a prediction unit; and encoding or decoding the prediction unit.

[0004] In some implementations of the method, the blending index is determined from a subset of blending indices associated with the content type.

[0005] In some implementations of the method, a first subset of blending indices is associated with a screen content type and a second subset of blending indices is associated with a camera content type.

[0006] In some implementations of the method, the first subset of blending indices includes a first blending index corresponding to a quarter of a blending area size, a second blending index corresponding to a half of the blending area size, and a third blending index corresponding to the blending area size, and the second subset of blending indices includes the third blending index, a fourth blending index corresponding to double the blending area size, and a fifth blending index corresponding to quadruple the blending area size. [0007] In some implementations of the method, where the content type is the screen content type, the first blending index is signaled using one bin, and each of the second blending index and the third blending index is signaled using more than one bin.

[0008] In some implementations of the method, where the content type is the camera content type, the third blending index is signaled using one bin, and each of the fourth blending index and the fifth blending index is signaled using more than one bin.

[0009] In some implementations of the method, during decoding, determining the content type for the video data comprises: decoding one or more syntax elements signaled within a bitstream associated with the video data; and determining the content type based on the decoded one or more syntax elements.

[0010] In some implementations of the method, during decoding, the content type is determined using one or both of a prediction sample or a motion vector for a coding unit associated with the video data.

[0011] In some implementations of the method, during encoding, the method comprises: signaling, within a bitstream which includes the encoded coding unit, one or both of an indication of the content type for the video data or an indication of the blending index.

[0012] An apparatus for content derivation for geometric partitioning mode video coding according to an implementation of this disclosure comprises: one or more memories and one or more processors configured to execute instructions stored in the one or more memories to: determine, from a subset of blending indices associated with a content type for video data to encode or decode, a blending index for geometric partitions of a coding unit of the video data; produce a prediction unit by performing blending against prediction signals of the geometric partitions according to the blending index; and encode or decode the prediction unit.

[0013] In some implementations of the apparatus, the one or more processors are configured to execute the instructions to: determine the content type as one of a screen content type or a camera content type.

[0014] In some implementations of the apparatus, the subset of blending indices is a first subset of blending indices where the content type is the screen content type or a second subset of blending indices where the content type is the camera content type.

[0015] In some implementations of the apparatus, the first subset of blending indices includes a first blending index corresponding to a quarter of a blending area size and signaled using one bin, a second blending index corresponding to a half of the blending area size and signaled using more than one bin, and a third blending index corresponding to the blending area size and signaled using more than one bin, and the second subset of blending indices includes the third blending index, a fourth blending index corresponding to double the blending area size and signaled using more than one bin, and a fifth blending index corresponding to quadruple the blending area size and signaled using more than one bin.

[0016] In some implementations of the apparatus, to determine the content type as the one of the screen content type or the camera content type, the one or more processors execute the instructions to: determine the content type based on one or more syntax elements signaled from a bitstream associated with the video data.

[0017] In some implementations of the apparatus, the one or more processors are configured to execute the instructions to: signal the content type within a bitstream to which the prediction unit is encoded.

[0018] A non-transitory computer-readable storage device including program instructions executable by one or more processors that, when executed, cause the one or more processors to perform operations for content derivation for geometric partitioning mode video coding, in an implementation, includes operations comprising: determining a content type for video data to encode or decode; producing a prediction unit by performing blending against prediction signals of the geometric partitions of a coding unit of the video data according to a blending index determined, based on the content type, for the geometric partitions; and outputting the prediction unit for encoding or decoding.

[0019] In some implementations of the non-transitory computer-readable storage device, the operations comprise: determining the blending index from a subset of blending indices associated with the content type.

[0020] In some implementations of the non-transitory computer-readable storage device, the subset of blending indices is a first subset of blending indices where the content type is a screen content type or a second subset of blending indices where the content type is a camera content type.

[0021] In some implementations of the non-transitory computer-readable storage device, the first subset of blending indices includes a first blending index corresponding to a quarter of a blending area size, a second blending index corresponding to a half of the blending area size, and a third blending index corresponding to the blending area size, and the second subset of blending indices includes the third blending index, a fourth blending index corresponding to double the blending area size, and a fifth blending index corresponding to quadruple the blending area size. [0022] In some implementations of the non-transitory computer-readable storage device, where the content type is the screen content type, the first blending index is signaled using one bin, and each of the second blending index and the third blending index is signaled using more than one bin, and, where the content type is the camera content type, the third blending index is signaled using one bin, and each of the fourth blending index and the fifth blending index is signaled using more than one bin.

[0023] These and other aspects of this disclosure are disclosed in the following detailed description of the implementations, the appended claims and the accompanying figures.

BRIEF DESCRIPTION OF THE DRAWINGS

[0024] The description herein makes reference to the accompanying drawings described below, wherein like reference numerals refer to like parts throughout the several views.

[0025] FIG. l is a schematic of an example of a video encoding and decoding system.

[0026] FIG. 2 is a block diagram of an example of a computing device that can implement a transmitting station or a receiving station.

[0027] FIG. 3 is a diagram of an example of a video stream to be encoded and decoded. [0028] FIG. 4 is a block diagram of an example of an encoder.

[0029] FIG. 5 is a block diagram of an example of a decoder.

[0030] FIG. 6 is an illustration of examples of portions of a video frame.

[0031] FIG. 7 is an illustration of examples of geometric partitions of a coding unit (CU).

[0032] FIG. 8 is an illustration of an example of reference picture lists from which motion vectors are selected for predicting motion in geometric partitions of a CU.

[0033] FIG. 9 is an illustration of examples of blending area sizes used for adaptive blending of geometric partitions of a CU.

[0034] FIG. 10 is a flowchart diagram of an example of a technique for content derivation for geometric partitioning mode video coding.

DETAILED DESCRIPTION

[0035] Video compression schemes may include breaking respective images, or frames, of a video stream into smaller portions, such as blocks, or coding tree units (CTUs), and generating an encoded bitstream using techniques to limit the information included for respective CTUs thereof. The bitstream can be decoded to re-create the source frames from the limited information. Encoding CTUs to or decoding CTUs from a bitstream can include predicting the values of pixels or CTUs based on similarities with other pixels or CTUs which have already been coded in the same frame, using intra-prediction, or in one or more different frames, using inter-prediction. Intra-prediction attempts to predict the pixel values of a CU of a CTU using pixels peripheral to the CU (e.g., pixels that are in the same frame as the CU, but which are outside the CU). Inter-prediction attempts to predict the pixel values of a CU of a CTU using pixels corresponding to the same video content in one or more other frames (e.g., pixels that are either co-located with the pixels of the CU in another frame or nearby the co-located portion of the CU in another frame).

[0036] During encoding, the result of an intra- or inter-prediction mode performed against a CU is a prediction unit (PU). A prediction residual can be determined based on a difference between the pixel values of the CU and the pixel values of the PU. The prediction residual and the prediction mode used to ultimately obtain that prediction residual can then be encoded to a bitstream. During decoding, the prediction residual is reconstructed into a CU using a PU produced based on the prediction mode and is thereafter included in an output video stream. [0037] Conventional video codecs include functionality for partitioning a CTU or CU into one or more smaller CUs, such as to enable the different processing of each of the smaller CUs based on the specific video data thereof. The partitioning of a CTU or CU into the one or more CUs results in a different prediction residual being determined for each of those CUs, and thus a different transform unit (TU) being used to transform the video data of each of those CUs, in which each of those TUs is of a size corresponding to the size of the corresponding CU. In some such conventional video codecs, a recursive partitioning scheme may be used, such as for quadtree partitioning of a CU. A given quadtree is defined by one or more factors, such as a minimum available transform block size, a maximum available transform block size, and a maximum transform split depth. Regardless of the particular manner of video block partitioning, conventional partitioning schemes are generally limited such as to rectangular shapes or certain CU sizes and/or based on the type of inter- or intra-prediction being performed. However, such conventional partitioning schemes may be suboptimal in some cases, such as where object edges do not align with rectangular CU boundaries.

[0038] One example of a solution to this is the geometric partitioning mode, used with the H.266 codec, also referred to as Versatile Video Coding (VVC). The geometric partitioning mode enables a CU to be partitioned other than into rectangular shapes in order to better control the prediction of objects with edges that do not align with rectangular CU boundaries. Currently, VVC supports 64 different partitions using the geometric partitioning mode, in which each partition splits a given CU into two partitions. The location of the splitting line for a given geometric partition of the geometric partitioning mode is mathematically derived from the angle and offset parameters of a specific partition. Each part of a given geometric partition in a CU is predicted (e.g., inter-predicted) using its own motion. Because each geometric partition has one motion vector and one reference index, prediction using a geometric partition is generally limited to unidirectional prediction. This constraint is applied to ensure that only two motion compensated predictions are used for each CU.

[0039] The use of the geometric partitioning mode for a given CU is signaled using a flag at the CU-level, as a kind of merge mode (e.g., regular merge mode, merge with motion vector difference (MMVD) mode, combined inter/intra prediction (CIIP) mode, or subblock merge mode). A geometric partition index indicating the specific geometric partition used (e.g., based on the angle and offset thereof) and indicating two merge indices (i.e., one for each of the two resulting partitions) are signaled along with the flag. During encoding, prior to the signaling of the above information within a bitstream, or during decoding, using such signaled information, after each of the two geometrically partitioned areas have been predicted, the sample values along the geometric partition edge (i.e., the splitting line) are adjusted using a blending processing with adaptive weighting to form the final prediction for each of those partitioned areas. During encoding, the transform and quantization processes are applied to the whole CU, rather than to each geometric partition individually, as is done with other prediction modes. The motion field for the CU predicted using the geometric partition mode is then stored for later use. [0040] A unidirectional prediction candidate list is derived directly from the merge candidate list. The index of a given type of unidirectional prediction motion in the unidirectional prediction candidate list may be denoted as n. A motion vector LX of the //-th extended merge candidate, in which L refers to the motion vector being from a reference motion vector list and Xis equal to the parity of //, is used as the //-th unidirectional prediction motion vector for the geometric partitioning mode. For a given merge index, a motion vector will be used from either list zero, denoted as Z0, or list one, denoted as LI. Where a corresponding LX motion vector of the //-th extended merge candidate does not exist, the L(1 X) motion vector of the same candidate is used instead as the unidirectional prediction motion vector for the geometric partition mode.

[0041] After each of the two geometric partitions is predicted using its own motion, blending is applied to the two resulting prediction signals to derive samples around the edge (i.e., the splitting line) between the geometric partitions. The blending weights for each position within the CU are derived based on the distance, denoted as d(x, y), between the given position and the partition edge, as: d(x, y) = (2x + 1 - w) ■ cos(<])i) + (2y + 1 - h) ■ sinful) -pj

[0042] in which:

PJ P- J ' cos(^) + p_{y j} ■ sin(^)

[0043] where z and j are indices for the angle and offset of a geometric partition and depend on the signaled geometric partition index. The values p_x,j and p_y,j can be derived from a lookup table based on the values of z, j, w (the width of the CU), and h (the height of the CU). The weights for each part of a geometric partition are derived as: w!dxL(x, y) = partldx? 32 + d(x, y) : 32 - d(x, y)

[0044] in which: w_o(x, y) = ( Clip3(0, 8, (wldL x, y) + 4) » 3 ) / 8 and wi(x, y) = 1 - w_o(x, y)

[0045] and in which the value of partldx depends on the angle index.

[0046] Referring to the motion field storage aspect of geometric partitioning mode in VVC, a motion vector MvO is determined for the first geometric partition, a motion vector Mvl is determined for the other geometric partition, and a combined motion vector Mv determined using MvO and Mvl. One motion vector type is stored for each position in the motion field of a geometric partition mode coded CU. The stored motion vector type for each individual position in the motion field is determined as: sType = abs(motionldx) < 32? 2 : (motionldx <= 0? (1 -partldx) : partldx)

[0047] where motionldx is equal to d(4x + 2,4y +2), which is calculated based on the above, and partldx depends on the angle index. If sType is equal to 0 or 1, MvO o Mvl, respectively, is stored in the corresponding motion field; otherwise, if sType is equal to 2, Mv is stored in the corresponding motion field. If A7vd and Mvl are from different reference picture lists (i.e., one from L0 and the other from LI), then MvO and Mvl are combined to form the bi-prediction motion vectors; otherwise, if MvO and Mvl are from the same reference picture list, only the unidirectional motion vector Mvl is stored.

[0048] As with VVC, the final prediction samples may then be generated by blending the two prediction signals using a weighted average. To perform the blending, two integer blending matrices Wo and Wi are used. The weights in Wo and Wi are derived from a ramp function based on a displacement from a predicted sample position to the boundary (i.e., splitting line) between the geometric partitions. The blending area size is generally fixed to two (i.e., two samples on each side of the boundary between the geometric partitions). In enhanced compression model (ECM), a line of reference software used to research video codec functionality beyond VVC, the blending process is improved by the addition of four blending area sizes (i.e., quarter, half, double, and quadruple of the existing area size). A CU-level index is signaled to indicate the selected blending area size for blending the prediction of the two prediction signals. Index 2, the blending area used in VVC, is signaled using 1 bin, while the four blending areas introduced in ECM are each signaled using 3 bins. To further accommodate the additional blending areas in ECM, extended weighting precision is utilized so that the maximum value of the weight used is changed from 8, the value used in VVC, to 32.

[0049] However, the geometric partitioning and blending approaches used in VVC and in ECM have shortcomings that limit their efficacy. First, the geometric partitioning mode in VVC is not efficient for screen content. The geometric partitioning mode in VVC is a single mode operating against all types of video content. However, there are multiple types of video content, such as screen content (e.g., video content from a screen share, presentation recording, or videogame stream) and camera content (e.g., video content captured using one or more cameras, such as movies, television shows, and mobile/ smartphone videos), and the geometric partitioning mode has limited efficacy with the screen content type. For example, because object edges in screen content are typically sharper than in camera content, blending performed in connection with the use of the geometric partitioning mode for screen content often results in quality loss, which may be depicted by blurring around the object edges within the video. Second, while ECM requires the blending index to be signaled, the signaling cost of the four blending indices introduced in ECM is 3 bins and thus very high. Third, to ensure that optimal coding efficiency is achieved, the blending index which is a best fit for a given CU needs to be selected based on rate-distortion costs. However, such rate-distortion optimization operations are computationally expensive for practical video encoders, and so it would be highly burdensome to impose this for each CU to be coded.

[0050] Implementations of this disclosure address problems such as these by deriving a content type for video data to encode or decode as well as determining and signaling a blending index to use for the video data based on the derived content type. Recognizing the differences in blending effects on screen content versus camera content, the implementations of this disclosure teach determining a content type for video data to encode or decode (e.g., one or more CUs of one or more frames of an input video stream during encoding or of a bitstream during decoding). A blending index (e.g., one of the five used in ECM) is determined for a current CU which includes geometric partitions based on the content type determination. Blending is performed against the geometric partitions according to the blending index, thus in which the blending index defines the size of the blending area used, to produce a prediction unit, which is then encoded or decoded, as applicable. During encoding, the blending index, in some cases alongside the determined content type, may then be signaled (e.g., at the CU-level) within the bitstream. [0051] While reference is made herein by example to CTUs, CUs, PUs, and the like, as are commonly used in video codecs such as H.265, referred to as High-Efficiency Video Coding (HEVC), and H.266, the implementations of this disclosure may be used with other video coding structures. In one particular but non-limiting example, the implementations of this disclosure may be used with superblocks, macroblocks, blocks, and the like, as are commonly used in video codecs such as VP9, AVI, and the currently in-development AV2. Accordingly, references herein to particular video coding structures such as CTUs, CUs, PUs, and the like shall be regarded as expressions of non-limiting example video coding structures with which the implementations of this disclosure may be used.

[0052] Further details of techniques for content derivation for geometric partitioning mode video coding are described herein with initial reference to a system in which such techniques can be implemented. FIG. l is a schematic of an example of a video encoding and decoding system 100. A transmitting station 102 can be, for example, a computer having an internal configuration of hardware such as that described in FIG. 2. However, other implementations of the transmitting station 102 are possible. For example, the processing of the transmitting station 102 can be distributed among multiple devices.

[0053] A network 104 can connect the transmitting station 102 and a receiving station 106 for encoding and decoding of the video stream. Specifically, the video stream can be encoded in the transmitting station 102, and the encoded video stream can be decoded in the receiving station 106. The network 104 can be, for example, the Internet. The network 104 can also be a local area network (LAN), wide area network (WAN), virtual private network (VPN), cellular telephone network, or any other means of transferring the video stream from the transmitting station 102 to, in this example, the receiving station 106.

[0054] The receiving station 106, in one example, can be a computer having an internal configuration of hardware such as that described in FIG. 2. However, other suitable implementations of the receiving station 106 are possible. For example, the processing of the receiving station 106 can be distributed among multiple devices.

[0055] Other implementations of the video encoding and decoding system 100 are possible. For example, an implementation can omit the network 104. In another implementation, a video stream can be encoded and then stored for transmission at a later time to the receiving station 106 or any other device having memory. In one implementation, the receiving station 106 receives (e.g., via the network 104, a computer bus, and/or some communication pathway) the encoded video stream and stores the video stream for later decoding. In an example implementation, a real-time transport protocol (RTP) is used for transmission of the encoded video over the network 104. In another implementation, a transport protocol other than RTP may be used (e.g., a Hypertext Transfer Protocol-based (HTTP -based) video streaming protocol). [0056] When used in a video conferencing system, for example, the transmitting station 102 and/or the receiving station 106 may include the ability to both encode and decode a video stream as described below. For example, the receiving station 106 could be a video conference participant who receives an encoded video bitstream from a video conference server (e.g., the transmitting station 102) to decode and view and further encodes and transmits his or her own video bitstream to the video conference server for decoding and viewing by other participants. [0057] In some implementations, the video encoding and decoding system 100 may instead be used to encode and decode data other than video data. For example, the video encoding and decoding system 100 can be used to process image data. The image data may include a block of data from an image (e.g., a CTU of a frame of a video stream). In such an implementation, the transmitting station 102 may be used to encode the image data and the receiving station 106 may be used to decode the image data.

[0058] Alternatively, the receiving station 106 can represent a computing device that stores the encoded image data for later use, such as after receiving the encoded or pre-encoded image data from the transmitting station 102. As a further alternative, the transmitting station 102 can represent a computing device that decodes the image data, such as prior to transmitting the decoded image data to the receiving station 106 for display.

[0059] FIG. 2 is a block diagram of an example of a computing device 200 that can implement a transmitting station or a receiving station. For example, the computing device 200 can implement one or both of the transmitting station 102 and the receiving station 106 of FIG.

1. The computing device 200 can be in the form of a computing system including multiple computing devices, or in the form of one computing device, for example, a mobile phone, a tablet computer, a laptop computer, a notebook computer, a desktop computer, and the like. [0060] A processor 202 in the computing device 200 can be a conventional central processing unit. Alternatively, the processor 202 can be another type of device, or multiple devices, capable of manipulating or processing information now existing or hereafter developed. For example, although the disclosed implementations can be practiced with one processor as shown (e.g., the processor 202), advantages in speed and efficiency can be achieved by using more than one processor.

[0061] A memory 204 in computing device 200 can be a read only memory (ROM) device or a random access memory (RAM) device in an implementation. However, other suitable types of storage device can be used as the memory 204. The memory 204 can include code and data 206 that is accessed by the processor 202 using a bus 212. The memory 204 can further include an operating system 208 and application programs 210, the application programs 210 including at least one program that permits the processor 202 to perform the techniques described herein. For example, the application programs 210 can include applications 1 through N, which further include encoding and/or decoding software that performs, amongst other things, enhanced multistage intra prediction as described herein.

[0062] The computing device 200 can also include a secondary storage 214, which can, for example, be a memory card used with a mobile computing device. Because the video communication sessions may contain a significant amount of information, they can be stored in whole or in part in the secondary storage 214 and loaded into the memory 204 as needed for processing.

[0063] The computing device 200 can also include one or more output devices, such as a display 218. The display 218 may be, in one example, a touch sensitive display that combines a display with a touch sensitive element that is operable to sense touch inputs. The display 218 can be coupled to the processor 202 via the bus 212. Other output devices that permit a user to program or otherwise use the computing device 200 can be provided in addition to or as an alternative to the display 218. When the output device is or includes a display, the display can be implemented in various ways, including by a liquid crystal display (LCD), a cathode-ray tube (CRT) display, or a light emitting diode (LED) display, such as an organic LED (OLED) display. [0064] The computing device 200 can also include or be in communication with an imagesensing device 220, for example, a camera, or any other image-sensing device 220 now existing or hereafter developed that can sense an image such as the image of a user operating the computing device 200. The image-sensing device 220 can be positioned such that it is directed toward the user operating the computing device 200. In an example, the position and optical axis of the image-sensing device 220 can be configured such that the field of vision includes an area that is directly adjacent to the display 218 and from which the display 218 is visible.

[0065] The computing device 200 can also include or be in communication with a soundsensing device 222, for example, a microphone, or any other sound-sensing device now existing or hereafter developed that can sense sounds near the computing device 200. The sound-sensing device 222 can be positioned such that it is directed toward the user operating the computing device 200 and can be configured to receive sounds, for example, speech or other utterances, made by the user while the user operates the computing device 200.

[0066] Although FIG. 2 depicts the processor 202 and the memory 204 of the computing device 200 as being integrated into one unit, other configurations can be utilized. The operations of the processor 202 can be distributed across multiple machines (wherein individual machines can have one or more processors) that can be coupled directly or across a local area or other network. The memory 204 can be distributed across multiple machines such as a network-based memory or memory in multiple machines performing the operations of the computing device 200.

[0067] Although depicted here as one bus, the bus 212 of the computing device 200 can be composed of multiple buses. Further, the secondary storage 214 can be directly coupled to the other components of the computing device 200 or can be accessed via a network and can comprise an integrated unit such as a memory card or multiple units such as multiple memory cards. The computing device 200 can thus be implemented in a wide variety of configurations. [0068] FIG. 3 is a diagram of an example of a video stream 300 to be encoded and decoded. The video stream 300 includes a video sequence 302. At the next level, the video sequence 302 includes a number of adjacent video frames 304. While three frames are depicted as the adjacent frames 304, the video sequence 302 can include any number of adjacent frames 304. The adjacent frames 304 can then be further subdivided into individual video frames, for example, a frame 306.

[0069] At the next level, the frame 306 can be divided into a series of planes or segments 308. The segments 308 can be subsets of frames that permit parallel processing, for example. The segments 308 can also be subsets of frames that can separate the video data into separate colors. For example, a frame 306 of color video data can include a luminance plane and two chrominance planes. The segments 308 may be sampled at different resolutions.

[0070] Whether or not the frame 306 is divided into slices 308, the frame 306 may be further subdivided into CTUs 310, which can contain data corresponding to, for example, NxM pixels in the frame 306, in which N and M may refer to the same integer value or to different integer values. The CTUs 310 can also be arranged to include data from one or more slices 308 of pixel data. The CTUs 310 can be of any suitable size, such as 4x4 pixels, 8x8 pixels, 16x8 pixels, 8x16 pixels, 16x16 pixels, or larger up to a maximum size, which may be 128x128 pixels or another NxM pixels size.

[0071] FIG. 4 is a block diagram of an example of an encoder 400. The encoder 400 can be implemented, as described above, in the transmitting station 102, such as by providing a computer software program stored in memory, for example, the memory 204. The computer software program can include machine instructions that, when executed by a processor such as the processor 202, cause the transmitting station 102 to encode video data in the manner described in FIG. 4. The encoder 400 can also be implemented as specialized hardware included in, for example, the transmitting station 102. In some implementations, the encoder 400 is a hardware encoder.

[0072] The encoder 400 has the following stages to perform the various functions in a forward path (shown by the solid connection lines) to produce an encoded or compressed bitstream 420 using the video stream 300 as input: an intra/inter prediction stage 402, a transform stage 404, a quantization stage 406, and an entropy encoding stage 408. The encoder 400 may also include a reconstruction path (shown by the dotted connection lines) to reconstruct a frame for encoding of future CTUs. In FIG. 4, the encoder 400 has the following stages to perform the various functions in the reconstruction path: a dequantization stage 410, an inverse transform stage 412, a reconstruction stage 414, and a loop filtering stage 416. Other structural variations of the encoder 400 can be used to encode the video stream 300.

[0073] In some cases, the functions performed by the encoder 400 may occur after a filtering of the video stream 300. That is, the video stream 300 may undergo pre-processing according to one or more implementations of this disclosure prior to the encoder 400 receiving the video stream 300. Alternatively, the encoder 400 may itself perform such pre-processing against the video stream 300 prior to proceeding to perform the functions described with respect to FIG. 4, such as prior to the processing of the video stream 300 at the intra/inter prediction stage 402. [0074] When the video stream 300 is presented for encoding after the pre-processing is performed, respective adjacent frames 304, such as the frame 306, can be processed in units of CTUs. At the intra/inter prediction stage 402, respective CUs of a CTU can be encoded using intra-frame prediction (also called intra-prediction) or inter-frame prediction (also called interprediction). In any case, a PU can be formed. In the case of intra-prediction, a PU may be formed from samples in the current frame that have been previously encoded and reconstructed. In the case of inter-prediction, a PU may be formed from samples in one or more previously constructed reference frames.

[0075] Next, the PU can be subtracted from the CU at the intra/inter prediction stage 402 to produce a prediction residual, also called a residual. The transform stage 404 transforms the residual into transform coefficients in, for example, the frequency domain using block-based transforms. The quantization stage 406 converts the transform coefficients into discrete quantum values, which are referred to as quantized transform coefficients, using a quantizer value or a quantization level. For example, the transform coefficients may be divided by the quantizer value and truncated.

[0076] The quantized transform coefficients are then entropy encoded by the entropy encoding stage 408. The entropy-encoded coefficients, together with other information used to decode the CU (which may include, for example, syntax elements such as used to indicate the type of prediction used, transform type, motion vectors, a quantizer value, or the like), are then output to the compressed bitstream 420. The compressed bitstream 420 can be formatted using various techniques, such as variable length coding or arithmetic coding. The compressed bitstream 420 can also be referred to as an encoded video stream or encoded video bitstream, and the terms will be used interchangeably herein. [0077] The reconstruction path (shown by the dotted connection lines) can be used to ensure that the encoder 400 and a decoder 500 (described below with respect to FIG. 5) use the same reference frames to decode the compressed bitstream 420. The reconstruction path performs functions that are similar to functions that take place during the decoding process (described below with respect to FIG. 5), including dequantizing the quantized transform coefficients at the dequantization stage 410 and inverse transforming the dequantized transform coefficients at the inverse transform stage 412 to produce a derivative prediction residual (also called a derivative residual). At the reconstruction stage 414, the PU that was predicted at the intra/inter prediction stage 402 can be added to the derivative residual to create a reconstructed CU. The loop filtering stage 416 can apply an in-loop filter or other filter to the reconstructed CU to reduce distortion such as blocking artifacts. Examples of filters which may be applied at the loop filtering stage 416 include, without limitation, a deblocking filter, a directional enhancement filter, and a loop restoration filter.

[0078] Other variations of the encoder 400 can be used to encode the compressed bitstream 420. In some implementations, a non-transform based encoder can quantize the residual signal directly without the transform stage 404 for certain CUs, CTUs, or frames. In some implementations, an encoder can have the quantization stage 406 and the dequantization stage 410 combined in a common stage.

[0079] FIG. 5 is a block diagram of an example of a decoder 500. The decoder 500 can be implemented in the receiving station 106, for example, by providing a computer software program stored in the memory 204. The computer software program can include machine instructions that, when executed by a processor such as the processor 202, cause the receiving station 106 to decode video data in the manner described in FIG. 5. The decoder 500 can also be implemented in hardware included in, for example, the transmitting station 102 or the receiving station 106. In some implementations, the decoder 500 is a hardware decoder.

[0080] The decoder 500, similar to the reconstruction path of the encoder 400 discussed above, includes in one example the following stages to perform various functions to produce an output video stream 516 from the compressed bitstream 420: an entropy decoding stage 502, a dequantization stage 504, an inverse transform stage 506, an intra/inter prediction stage 508, a reconstruction stage 510, a loop filtering stage 512, and a post filter stage 514. Other structural variations of the decoder 500 can be used to decode the compressed bitstream 420.

[0081] When the compressed bitstream 420 is presented for decoding, the data elements within the compressed bitstream 420 can be decoded by the entropy decoding stage 502 to produce a set of quantized transform coefficients. The dequantization stage 504 dequantizes the quantized transform coefficients (e.g., by multiplying the quantized transform coefficients by the quantizer value), and the inverse transform stage 506 inverse transforms the dequantized transform coefficients to produce a derivative residual that can be identical to that created by the inverse transform stage 412 in the encoder 400. Using header information decoded from the compressed bitstream 420, the decoder 500 can use the intra/inter prediction stage 508 to create the same PU as was created in the encoder 400 (e.g., at the intra/inter prediction stage 402). [0082] At the reconstruction stage 510, the PU can be added to the derivative residual to create a reconstructed CU. The loop filtering stage 512 can be applied to the reconstructed CU to reduce blocking artifacts. Examples of filters which may be applied at the loop filtering stage 512 include, without limitation, a deblocking filter, a directional enhancement filter, and a loop restoration filter. Other filtering can be applied to the reconstructed CU. In this example, the post filter stage 514 is applied to the reconstructed CU to reduce blocking distortion, and the result is output as the output video stream 516. The output video stream 516 can also be referred to as a decoded video stream, and the terms will be used interchangeably herein.

[0083] Other variations of the decoder 500 can be used to decode the compressed bitstream 420. In some implementations, the decoder 500 can produce the output video stream 516 without the post filter stage 514 or otherwise omit the post filter stage 514.

[0084] FIG. 6 is an illustration of examples of portions of a video frame 600, which may, for example, be the frame 306 shown in FIG. 3. The video frame 600 includes a number of 64^64 CTUs, such as four 64x64 CTUs 610 in two rows and two columns in a matrix or Cartesian plane, as shown. Each 64x64 CTU 610 may include up to four 32x32 CUs 620. Each 32x32 CU 620 may include up to four 16x 16 CUs 630. Each 16x 16 CU 630 may include up to four 8x8 CUs 640. Each 8x8 CU 640 may include up to four 4x4 CUs 950. Each 4x4 CU 950 may include 16 pixels, which may be represented in four rows and four columns in each respective CU in the Cartesian plane or matrix.

[0085] In some implementations, the video frame 600 may include CTUs larger than 64x64 and/or CUs smaller than 4x4. Subject to features within the video frame 600 and/or other criteria, the video frame 600 may be partitioned into various arrangements. Although one arrangement of CUs is shown, any arrangement may be used. Although FIG. 6 shows NxN CTUs and CUs, in some implementations, NxM CTUs and/or CUs may be used, wherein N and M are different numbers. For example, 32x64 CTUs, 64x32 CTUs, 16x32 CUs, 32x 16 CUs, or any other size may be used. In some implementations, Nx2N CTUs or CUs, 2NxN CTUs or CUs, or a combination thereof, may be used.

[0086] The pixels may include information representing an image captured in the video frame 600, such as luminance information, color information, and location information. In some implementations, a block, such as a 16x 16 pixel block as shown, may include a luminance block 660, which may include luminance pixels 662; and two chrominance blocks 670, 680, such as a U or Cb chrominance block 670, and a V or Cr chrominance block 680. The chrominance blocks 670, 680 may include chrominance pixels 690. For example, the luminance block 660 may include 16x 16 luminance pixels 662 and each chrominance block 670, 680 may include 8x8 chrominance pixels 690 as shown.

[0087] In some implementations, coding the video frame 600 may include ordered blocklevel coding. Ordered block-level coding may include coding CUs of the video frame 600 in an order, such as raster-scan order, wherein CUs may be identified and processed starting with a CTU in the upper left corner of the video frame 600, or portion of the video frame 600, and proceeding along rows from left to right and from the top row to the bottom row, identifying each CU in turn for processing. For example, the 64x64 CTU in the top row and left column of the video frame 600 may be the first CTU coded and the 64x64 CTU immediately to the right of the first CTU may be the second CTU coded. The second row from the top may be the second row coded, such that the 64x64 CTU in the left column of the second row may be coded after the 64x64 CTU in the rightmost column of the first row.

[0088] In some implementations, coding a CTU of the video frame 600 may include using quad-tree coding, which may include coding smaller CUs within a CTU in raster-scan order. For example, the 64x64 CTU shown in the bottom left corner of the portion of the video frame 600 may be coded using quad-tree coding wherein the top left 32x32 CU may be coded, then the top right 32x32 CU may be coded, then the bottom left 32x32 CU may be coded, and then the bottom right 32x32 CU may be coded. Each 32x32 CU may be coded using quad-tree coding wherein the top left 16x 16 CU may be coded, then the top right 16x 16 CU may be coded, then the bottom left 16x 16 CU may be coded, and then the bottom right 16x 16 CU may be coded. Each 16x 16 CU may be coded using quad-tree coding wherein the top left 8x8 CU may be coded, then the top right 8x8 CU may be coded, then the bottom left 8x8 CU may be coded, and then the bottom right 8x8 CU may be coded. Each 8x8 CU may be coded using quad-tree coding wherein the top left 4x4 CU may be coded, then the top right 4x4 CU may be coded, then the bottom left 4x4 CU may be coded, and then the bottom right 4x4 CU may be coded. In some implementations, 8x8 CUs may be omitted for a 16x 16 CU, and the 16x 16 CU may be coded using quad-tree coding wherein the top left 4x4 CU may be coded, then the other 4x4 CUs in the 16x 16 CU may be coded in raster-scan order.

[0089] In some implementations, coding the video frame 600 may include encoding the information included in the original version of the image or video frame by, for example, omitting some of the information from that original version of the image or video frame from a corresponding encoded image or encoded video frame. For example, the coding may include reducing spectral redundancy, reducing spatial redundancy, or a combination thereof. Reducing spectral redundancy may include using a color model based on a luminance component (Y) and two chrominance components (U and V or Cb and Cr), which may be referred to as the YUV or YCbCr color model, or color space. Using the YUV color model may include using a relatively large amount of information to represent the luminance component of a portion of the video frame 600, and using a relatively small amount of information to represent each corresponding chrominance component for the portion of the video frame 600. For example, a portion of the video frame 600 may be represented by a high-resolution luminance component, which may include a 16x 16 block of luma samples, and by two lower resolution chrominance components, each of which represents the portion of the image as an 8x8 block of chroma samples. A sample may indicate a value, for example, a value in the range from 0 to 255, and may be stored or transmitted using, for example, eight bits. Although this disclosure is described in reference to the YUV color model, another color model may be used. Reducing spatial redundancy may include transforming a CU into the frequency domain using, for example, a discrete cosine transform. For example, a unit of an encoder may perform a discrete cosine transform using transform coefficient values based on spatial frequency.

[0090] Although described herein with reference to matrix or Cartesian representation of the video frame 600 for clarity, the video frame 600 may be stored, transmitted, processed, or a combination thereof, in a data structure such that pixel values and/or luma and chroma samples may be efficiently represented for the video frame 600. For example, the video frame 600 may be stored, transmitted, processed, or any combination thereof, in a two-dimensional data structure such as a matrix as shown, or in a one-dimensional data structure, such as a vector array. Furthermore, although described herein as showing a chrominance subsampled image where U and V have half the resolution of Y, the video frame 600 may have different configurations for the color channels thereof. For example, referring still to the YUV color space, full resolution may be used for all color channels of the video frame 600. In another example, a color space other than the YUV color space may be used to represent the resolution of color channels of the video frame 600.

[0091] FIG. 7 is an illustration of examples of geometric partitions 700 of a CU. Twenty four example CUs are shown each with multiple example splitting lines, depicting at least some of the possible geometric partitions into which a given CU may be split. As has been mentioned above, a CU is partitioned into two geometric partitions in which the splitting line represents the boundary between those two geometric partitions. The location of the splitting line for a given geometric partition is derived from the angle and offset parameters of that partition. [0092] FIG. 8 is an illustration of an example of reference picture lists 800 from which motion vectors are selected for predicting motion in geometric partitions of a CU. As has been mentioned above, each part of a given geometric partition in a CU is predicted using its own motion. Because each geometric partition has one motion vector and one reference index, prediction using a geometric partition is generally limited to unidirectional prediction. Using merge indices of a merge candidate list, a unidirectional prediction candidate list is derived for the geometric partitions. The index of a given type of unidirectional prediction motion in the unidirectional prediction candidate list may be denoted as n. A motion vector LX of the //-th extended merge candidate, in which L refers to the motion vector being from a reference motion vector list and Xis equal to the parity of //, is used as the //-th unidirectional prediction motion vector for the geometric partitioning mode.

[0093] For a given merge index, a motion vector for one geometric partition will be used from either list zero, denoted as Z0, or list one, denoted as Z7, and the motion vector for the other geometric partition will be used from the other list. Where a corresponding LX motion vector of the //-th extended merge candidate does not exist, the L(1 -X) motion vector of the same candidate is used instead as the unidirectional prediction motion vector for the geometric partition mode. The motion vector determined for a given geometric partition will be used to predict the motion of that geometric partition and thus to determine or otherwise produce a prediction signal for that geometric partition. Thus, a CU is geometrically partitioned into two partitions in which a prediction signal for a first geometric partition is determined using a motion vector from a first reference picture list (e.g., L(L) and a prediction signal for a second geometric partition is determined using a motion vector from a second reference picture list (e.g., Z7).

[0094] FIG. 9 is an illustration of examples of blending area sizes used for adaptive blending of geometric partitions of a CU. In particular, a graph 900 depicts two geometric partitions, Partition A and Partition B, on the left and right sides, respectively, of a line representing the boundary between those geometric partitions. The prediction signals resulting from predicting the motion of each of the geometric partitions of a CU using the motion vectors determined for those partitions are blended using a weighted average to further improve the prediction quality and thus the overall resulting quality of the video data once decoded and output for display. The blending is performed using a blending index selected based on a content type of the video data which includes the CU with the geometric partitions under prediction.

[0095] As shown, five blending indices are available for selection, in which each corresponds to a different size of a blending area (e.g., a number of pixels on each side of the boundary to be blended) to use for the geometric partitions. The five blending indices include a first blending index 902, denoted as T/4, corresponding to one quarter of a default blending area size, a second blending index 904, denoted as T/2, corresponding to one half of the default blending area size, a third blending index 906, denoted as T, corresponding to the default blending area size, a fourth blending index 908, denoted as 2T, corresponding to double the default blending area size, and a fifth blending index 910, denoted as 4T, corresponding to quadruple the default blending area size. The default blending area size may, for example, be two samples on each side of the boundary between the geometric partitions. The blending indices 902 through 910 each correspond to a different line in the graph 900, in which a first line is shown as a solid line and is associated with the blending matrix Wo, a second line is shown with long dashes and is associated with the blending matrix Wi, a third line shown with short dashes and is associated with the blending matrix W2, a fourth line is shown with a dash-dot pattern and is associated with the blending matrix Ws, and a fifth line is shown with a dash-dot-dot pattern and is associated with the blending matrix W4.

[0096] Due to differences in video data of different video content types (e.g., video data of the screen content type and video data of the camera content type), certain of the five blending indices 902 through 910 are optimal for certain video content types. In particular, because a larger blending area size increases the possibility of varying samples being blended (e.g., different colors or shading), it also increases the possibility of the boundary between the geometric partitions being blurry where the objects along the partitions have sharp edges, as is common in screen content. Thus, ones of the five blending indices are arranged into two subsets of blending indices in each of the subsets corresponds to one of the screen content type or the camera content type. In one particular example, the subset of blending indices corresponding to the screen content type includes the first blending index 902, the second blending index 904, and the third blending index 906, while the subset of blending indices corresponding to the camera content type includes the third blending index 906, the fourth blending index, 908, and the fifth blending index 910.

[0097] In cases where the blending index determined for a geometrically partitioned CU is signaled in a bitstream (e.g., during at least certain encoder-side implementations), the number of bins to use to signal the determined blending index is based on the subject content type of the video data. For example, where the video data is of the screen content type, the most frequently used blending index may be the first blending index 902, as it provides the smallest blending area size available for the blending indices of the first subset of blending indices. In such a case, one bin may be used to signal the use of the first blending index 902 for screen content, while two bins may be used to signal each of the second blending index 904 and the third blending index 906 for screen content. In another example, where the video data is of the camera content type, the most frequently used blending index may be the third blending index 906, as it provides the smallest blending area size available for the blending indices of the second subset of blending indices. In such a case, one bin may be used to signal the use of the third blending index 906 for camera content, while two bins may be used to signal each of the fourth blending index 908 and the fifth blending index 910 for camera content.

[0098] Further details of techniques for content derivation for geometric partitioning mode video coding are now described. FIG. 10 is a flowchart diagram of an example of a technique 1000 for content derivation for geometric partitioning mode video coding. The technique 1000 may, for example, be wholly or partially performed at a prediction stage of an encoder used to encode a video stream (e.g., the intra/inter prediction stage 402) or a prediction stage of a decoder used to decode a bitstream (e.g., the intra/inter prediction stage 508).

[0099] The technique 1000 can be implemented, for example, as a software program that may be executed by computing devices such as the transmitting station 102 or the receiving station 106. For example, the software program can include machine-readable instructions that may be stored in a memory such as the memory 204 or the secondary storage 214, and that, when executed by a processor, such as the processor 202, may cause the computing device to perform the technique 1000. The technique 1000 can be implemented using specialized hardware or firmware. For example, a hardware component, such as a hardware coder, may be configured to perform the technique 1000. As explained above, some computing devices may have multiple memories or processors, and the operations described in the technique 1000 can be distributed using multiple processors, memories, or both. For simplicity of explanation, the technique 1000 is depicted and described herein as a series of steps or operations. However, the steps or operations in accordance with this disclosure can occur in various orders and/or concurrently. Additionally, other steps or operations not presented and described herein may be used.

Furthermore, not all illustrated steps or operations may be required to implement a technique in accordance with the disclosed subject matter.

[0100] At 1002, a content type is determined for video data to encode or decode. The video data generally refers to some or all of an input video stream to encode or some or all of an encoded bitstream to decode. The content type is determined as one of a screen content type, in which the video data is determined to be screen content, or a camera content type, in which the video data is determined to be camera content. The particular manner by which the content type is determined may differ based on whether the video data is being encoded or decoded.

[0101] During encoding, the content type can be determined based on a rate-distortion analysis in which rate-distortion costs are determined for each of the screen content type and the camera content type and the content type associated with the lowest of the rate-distortion costs is determined as the content type for the video data. For example, determining the content type for the video data can include encoding a first copy of a portion of the video data, such as a current CU, according to one or more blending indices indicative of blending area sizes available for blending prediction signals of geometric partitions of the current CU (e.g., the first subset of blending indices, as described below) and encoding a second copy of the portion of the video data (e.g., the same current CU) according to one or more blending indices (e.g., the second subset of blending indices, as described below). A first cost (e.g., a first rate-distortion cost) can be determined for the encoded first copy and a second cost (e.g., a second rate-distortion cost) can be determined for the encoded second copy. The content type for the video data may thus be determined as the one of the screen content type or the camera content type based on a lowest one of the first cost and the second cost. In some implementations, the encoded first copy and the encoded second copy may each be produced using all blending indices instead of using only a subset of blending indices.

[0102] During decoding, the content type can be determined based on motion vectors or prediction samples included in the encoded bitstream. Determining the content type based on motion vectors in the bitstream can include determining whether the motion vectors of a current CU are all at integer positions. If so, the content type is determined as the screen content type and thus the video data is determined to be screen content. Otherwise, the content type is determined as the camera content type and thus the video data is determined to be camera content. This may, for example, be because traversals of screen content typically move on a fixed pixel basis (e.g., 30 pixels of motion each time a page is scrolled). Thus, where a motion vector is at a fractional position, it may thus indicate that the content is camera content. In one example, if a geometric partition of a CU is intra-predicted, the motion vector of the geometric partition may be set to (0,0) (i.e., an integer position) and thus the video data may be determined to be screen content. In another example, if any geometric partition of the CU is intra-predicted, the video data may be determined to be camera content.

[0103] Alternatively, determining the content type based on motion vectors in the bitstream can include using the absolute motion vector difference between the two geometric partitions of a current CU. Thus, in some implementations, determining the content type of the video data can include determining the content type after geometric partitioning is performed for one or more CUs of the video data. For example, where the absolute motion vector difference between the two geometric partitions for a current CU meets or is above a threshold, the content type may be determined as the screen content type and thus the video data may be determined to be screen content. Otherwise, where the absolute motion vector difference between the two geometric partitions for the current CU is below the threshold, the content type may be determined as the camera content type and thus the video data may be determined to be camera content. The threshold may, for example, be predefined (e.g., determined during decoding or signaled in a bitstream). For example, the threshold may be signaled within a sequence parameter set (SPS), picture parameter set (PPS), a picture header, or a slice header of the bitstream. In some cases, intra prediction may be used for a geometric partition, in which case, to determine the content type using a motion vector for that geometric partition, the motion vector may be set to (0, 0). [0104] Determining the content type based on prediction samples in the bitstream can include comparing values of prediction samples at certain positions in the blending area (i.e., an area along the splitting line between geometric partitions of a current CU), in which each position includes one or more samples on each side of the splitting line. In particular, a difference is computed between those prediction samples. The difference between the prediction samples indicates whether a weighted average of prediction samples, as may be used with camera content, is close enough to the actual object shown in the current CU. Where the maximum absolute value of that difference meets or is above a threshold, the content type is determined as the screen content type and thus the video data is determined to be screen content. However, where the absolute value of that difference is below the threshold, the content type is determined as the camera content type and thus the video data is determined to be camera content. Alternatively, if the sum of absolute values of the differences between prediction samples at various positions in the blending area meets or is above the threshold, the content type may be determined as the screen content type and thus the video data is determined to be screen content; otherwise, the content type may be determined as the camera content type and thus the video data is determined to be camera content. As a further alternative, if the standard deviation of the difference between prediction samples meets or is above the threshold, or if the ratio between the standard deviation and the average of the absolute differences between the prediction samples meets or is above the threshold, the content type may be determined as the screen content type and thus the video data is determined to be screen content; otherwise, the content type may be determined as the camera content type and thus the video data is determined to be camera content. The threshold may, for example, be predefined (e.g., determined during decoding or signaled in a bitstream). For example, the threshold may be signaled within a SPS, a PPS, a picture header, or a slice header of the bitstream.

[0105] In some implementations, the content type can be determined during decoding by decoding one or more syntax elements indicative of the content type from the bitstream. For example, the one or more syntax elements indicative of the content type may be stored in connection with a current CU (i.e., at the CU-level or in a slice header or picture header corresponding to the CU) or in a parameter set for the bitstream (e.g., in a SPS or PPS for some or all of the bitstream). Deriving the content type in this manner results in the decoder relying entirely upon the content type derivation performed at an encoder used to produce the bitstream. [0106] At 1004, a blending index is determined for geometric partitions of a CU of the video data to encode or decode based on the content type determined for the video data. As mentioned above, a first subset of blending indices is used where the content type is the screen content type, and a second subset of blending indices is used where the content type is the camera content type. The first subset of blending indices includes a first blending index corresponding to a quarter of a blending area size (e.g., of a default blending area size), a second blending index corresponding to a half of the blending area size, and a third blending index corresponding to the blending area size. The second subset of blending indices includes the third blending index, a fourth blending index corresponding to double the blending area size, and a fifth blending index corresponding to quadruple the blending area size.

[0107] Generally, where the content type is determined to be the screen content type, the blending index determined for the geometric partitions of the CU is the first blending index, and, where the content type is determined to be the camera content type, the blending index determined is the third blending index. This is because the first blending index uses a smallest blending area for the first subset of blending indices corresponding to the screen content type and the third blending index uses a smallest blending area for the second subset of blending indices corresponding to the camera content type. However, in some implementations, after the content type of the video data is determined, the determined content type may be verified, such as by repeating one or more of the above operations described for determining the content type using one or more other CUs. In the event a different content type is determined, thus indicating a low or otherwise not high confidence of the original content type determination, the blending index determined for the current CU may be the third blending index regardless of the subset of blending indices from which it is determined (i.e., regardless of the determined content type).

[0108] At 1006, blending is performed against prediction signals of the geometric partitions according to the blending index determined therefor to produce a prediction unit. To perform the blending, two integer blending matrices Wo and Wi are used, in which each of those matrices corresponds to a different one of the geometric partitions and thus is used for a different one of the prediction signals. The weights in Wo and Wi are derived from a ramp function based on a displacement from a predicted sample position to the boundary (i.e., splitting line) between the geometric partitions.

[0109] At 1008, the prediction unit is encoded (e.g., to a bitstream) or decoded (e.g., for output within an output video stream), based on whether the technique 1000 is performed during encoding or decoding. In some cases, the prediction unit may be used as reference data for the prediction of one or more other CUs, whether in a same frame as the CU which includes the geometric partitions based on which the prediction unit was produced or a different frame.

[0110] In some implementations, the technique 1000 includes signaling, based on the content type, an indication of the blending index used to perform the blending against the prediction signals of the geometric partitions. Because the first blending index provides the smallest blending area size for screen content from amongst the blending indices in the first subset and the third blending index provides the smallest blending area size for camera content from amongst the blending indices in the second subset, the first blending index is the most frequently used blending index in the first subset of blending indices and the third blending index is the most frequently used blending index in the second subset of blending indices. Thus, a smallest number of bins is used to signal the first blending index as the blending index where the content type is the screen content type or the third blending index as the blending index where the content type is the camera content type, and the other blending indices for each content type may be signaled using a larger number of bins. For example, one bin may be used to signal the use of the first blending index for screen content or the use of the third blending index for camera content, while two bins may be used to signal the use of the second or third blending index for screen content or the use of the fourth or fifth blending index for camera content.

[OHl] In some implementations, a constraint or other configuration of the coder used to perform the prediction of the CU which includes the geometric partitions may limit the available blending indices for each content type to a single blending index (e.g., the first blending index for screen content and the third blending index for camera content). In such a case, the blending index determined for the CU is determined based on content type, and the technique 1000 may accordingly omit signaling the blending index.

[0112] In some implementations, a flag may be signaled to indicate whether the content type determined for the video data is correct. For example, and as mentioned above, the content type for the video data (or a different portion thereof, such as a different CU) can be determined one or more times after the initial determination. In the event one or more subsequent content type determinations are contrary to the initial determination, a flag may be signaled to indicate that the initial determination is incorrect; however, where one or more subsequent content type determinations (e.g., all of them) are consistent with the initial determination, a flag may be signaled to indicate that the initial determination is correct. In one particular example, the flag may be signaled at the CU-level for each CU, in which the content type determination may be repeated for each CU.

[0113] In some implementations, the blending index used for a geometric partition may be signaled alongside information used to produce the prediction signal for that geometric partition. For example, the blending index may be signaled alongside one or more of the motion vector used to predict the motion of a geometric partition, an angle of the geometric partition (e.g., of the splitting line separating the geometric partitions of the CU), or an offset indicating a location of the splitting line for the geometric partitions.

[0114] The aspects of encoding and decoding described above illustrate some examples of encoding and decoding techniques. However, it is to be understood that encoding and decoding, as those terms are used in the claims, could mean compression, decompression, transformation, or any other processing or change of data.

[0115] The word “example” is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “example” is not necessarily to be construed as being preferred or advantageous over other aspects or designs. Rather, use of the word “example” is intended to present concepts in a concrete fashion. As used in this application, the term “or” is intended to mean an inclusive “or” rather than an exclusive “or.” That is, unless specified otherwise or clearly indicated otherwise by the context, the statement “X includes A or B” is intended to mean any of the natural inclusive permutations thereof. That is, if X includes A; X includes B; or X includes both A and B, then “X includes A or B” is satisfied under any of the foregoing instances. In addition, the articles “a” and “an” as used in this application and the appended claims should generally be construed to mean “one or more,” unless specified otherwise or clearly indicated by the context to be directed to a singular form. Moreover, use of the term “an implementation” or the term “one implementation” throughout this disclosure is not intended to mean the same implementation unless described as such.

[0116] Implementations of the transmitting station 102 and/or the receiving station 106 (and the algorithms, methods, instructions, etc., stored thereon and/or executed thereby, including by the encoder 400 and the decoder 500, or another encoder or decoder as disclosed herein) can be realized in hardware, software, or any combination thereof. The hardware can include, for example, computers, intellectual property (IP) cores, application-specific integrated circuits (ASICs), programmable logic arrays, optical processors, programmable logic controllers, microcode, microcontrollers, servers, microprocessors, digital signal processors, or any other suitable circuit. In the claims, the term “processor” should be understood as encompassing any of the foregoing hardware, either singly or in combination. The terms “signal” and “data” are used interchangeably. Further, portions of the transmitting station 102 and the receiving station 106 do not necessarily have to be implemented in the same manner.

[0117] Further, in one aspect, for example, the transmitting station 102 or the receiving station 106 can be implemented using a general purpose computer or general purpose processor with a computer program that, when executed, carries out any of the respective methods, algorithms, and/or instructions described herein. In addition, or alternatively, for example, a special purpose computer/processor can be utilized which can contain other hardware for carrying out any of the methods, algorithms, or instructions described herein.

[0118] The transmitting station 102 and the receiving station 106 can, for example, be implemented on computers in a video conferencing system. Alternatively, the transmitting station 102 can be implemented on a server, and the receiving station 106 can be implemented on a device separate from the server, such as a handheld communications device. In this instance, the transmitting station 102 can encode content into an encoded video signal and transmit the encoded video signal to the communications device. In turn, the communications device can then decode the encoded video signal. Alternatively, the communications device can decode content stored locally on the communications device, for example, content that was not transmitted by the transmitting station 102. Other suitable transmitting and receiving implementation schemes are available. For example, the receiving station 106 can be a generally stationary personal computer rather than a portable communications device.

[0119] Further, all or a portion of implementations of this disclosure can take the form of a computer program product accessible from, for example, a computer-usable or computer- readable medium. A computer-usable or computer-readable medium can be any device that can, for example, tangibly contain, store, communicate, or transport the program for use by or in connection with any processor. The medium can be, for example, an electronic, magnetic, optical, electromagnetic, or semiconductor device. Other suitable mediums are also available. [0120] The above-described implementations and other aspects have been described in order to facilitate easy understanding of this disclosure and do not limit this disclosure. On the contrary, this disclosure is intended to cover various modifications and equivalent arrangements included within the scope of the appended claims, which scope is to be accorded the broadest interpretation as is permitted under the law so as to encompass all such modifications and equivalent arrangements.

Claims

What is claimed is:

1. A method for content derivation for geometric partitioning mode video coding, the method comprising: determining a content type for video data to encode or decode; determining a blending index for geometric partitions of a coding unit of the video data based on the content type; performing blending against prediction signals of the geometric partitions according to the blending index to produce a prediction unit; and encoding or decoding the prediction unit.

2. The method of claim 1, wherein the blending index is determined from a subset of blending indices associated with the content type.

3. The method of claim 2, wherein a first subset of blending indices is associated with a screen content type and a second subset of blending indices is associated with a camera content type.

4. The method of claim 3, wherein the first subset of blending indices includes a first blending index corresponding to a quarter of a blending area size, a second blending index corresponding to a half of the blending area size, and a third blending index corresponding to the blending area size, and wherein the second subset of blending indices includes the third blending index, a fourth blending index corresponding to double the blending area size, and a fifth blending index corresponding to quadruple the blending area size.

5. The method of claim 4, wherein, where the content type is the screen content type, the first blending index is signaled using one bin, and each of the second blending index and the third blending index is signaled using more than one bin.

6. The method of claim 4, wherein, where the content type is the camera content type, the third blending index is signaled using one bin, and each of the fourth blending index and the fifth blending index is signaled using more than one bin.

7. The method of any of claims 1, 2, 3, 4, 5, or 6, wherein, during decoding, determining the content type for the video data comprises: decoding one or more syntax elements signaled within a bitstream associated with the video data; and determining the content type based on the decoded one or more syntax elements.

8. The method of any of claims 1, 2, 3, 4, 5, or 6, wherein, during decoding, the content type is determined using one or both of a prediction sample or a motion vector for a coding unit associated with the video data.

9. The method of any of claims 1, 2, 3, 4, 5, or 6, wherein, during encoding, the method comprises: signaling, within a bitstream which includes the encoded coding unit, one or both of an indication of the content type for the video data or an indication of the blending index.

10. An apparatus for content derivation for geometric partitioning mode video coding, the apparatus comprising: one or more memories; and one or more processors configured to execute instructions stored in the one or more memories to: determine, from a subset of blending indices associated with a content type for video data to encode or decode, a blending index for geometric partitions of a coding unit of the video data; produce a prediction unit by performing blending against prediction signals of the geometric partitions according to the blending index; and encode or decode the prediction unit.

11. The apparatus of claim 10, wherein the one or more processors are configured to execute the instructions to: determine the content type as one of a screen content type or a camera content type.

12. The apparatus of claim 11, wherein the subset of blending indices is a first subset of blending indices where the content type is the screen content type or a second subset of blending indices where the content type is the camera content type.

13. The apparatus of claim 12, wherein the first subset of blending indices includes a first blending index corresponding to a quarter of a blending area size and signaled using one bin, a second blending index corresponding to a half of the blending area size and signaled using more than one bin, and a third blending index corresponding to the blending area size and signaled using more than one bin, and wherein the second subset of blending indices includes the third blending index, a fourth blending index corresponding to double the blending area size and signaled using more than one bin, and a fifth blending index corresponding to quadruple the blending area size and signaled using more than one bin.

14. The apparatus of claims 11, 12, or 13, wherein, to determine the content type as the one of the screen content type or the camera content type, the one or more processors execute the instructions to: determine the content type based on one or more syntax elements signaled from a bitstream associated with the video data.

15. The apparatus of any of claims 10, 11, 12, or 13, wherein the one or more processors are configured to execute the instructions to: signal the content type within a bitstream to which the prediction unit is encoded.

16. A non-transitory computer-readable storage device including program instructions executable by one or more processors that, when executed, cause the one or more processors to perform operations for content derivation for geometric partitioning mode video coding, the operations comprising: determining a content type for video data to encode or decode; producing a prediction unit by performing blending against prediction signals of the geometric partitions of a coding unit of the video data according to a blending index determined, based on the content type, for the geometric partitions; and outputting the prediction unit for encoding or decoding.

17. The non-transitory computer-readable storage device of claim 16, the operations comprising: determining the blending index from a subset of blending indices associated with the content type.

18. The non-transitory computer-readable storage device of claim 17, wherein the subset of blending indices is a first subset of blending indices where the content type is a screen content type or a second subset of blending indices where the content type is a camera content type.

19. The non-transitory computer-readable storage device of claim 18, wherein the first subset of blending indices includes a first blending index corresponding to a quarter of a blending area size, a second blending index corresponding to a half of the blending area size, and a third blending index corresponding to the blending area size, and wherein the second subset of blending indices includes the third blending index, a fourth blending index corresponding to double the blending area size, and a fifth blending index corresponding to quadruple the blending area size.

20. The non-transitory computer-readable storage device of claim 19, wherein, where the content type is the screen content type, the first blending index is signaled using one bin, and each of the second blending index and the third blending index is signaled using more than one bin, and wherein, where the content type is the camera content type, the third blending index is signaled using one bin, and each of the fourth blending index and the fifth blending index is signaled using more than one bin.