[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

US20130272423A1 - Transform coefficient coding - Google Patents

Transform coefficient coding Download PDF

Info

Publication number
US20130272423A1
US20130272423A1 US13/862,818 US201313862818A US2013272423A1 US 20130272423 A1 US20130272423 A1 US 20130272423A1 US 201313862818 A US201313862818 A US 201313862818A US 2013272423 A1 US2013272423 A1 US 2013272423A1
Authority
US
United States
Prior art keywords
contexts
scan
block
scan order
transform coefficients
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/862,818
Inventor
Wei-Jung Chien
Joel Sole Rojals
Jianle Chen
Rajan Laxman Joshi
Marta Karczewicz
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Priority to US13/862,818 priority Critical patent/US20130272423A1/en
Priority to SG11201405856XA priority patent/SG11201405856XA/en
Priority to CA2869305A priority patent/CA2869305A1/en
Priority to CN201380019906.1A priority patent/CN104247420A/en
Priority to AU2013249427A priority patent/AU2013249427A1/en
Priority to TW102113542A priority patent/TW201352004A/en
Priority to RU2014145851A priority patent/RU2014145851A/en
Priority to KR20147031985A priority patent/KR20150003327A/en
Priority to EP13718986.6A priority patent/EP2839646A1/en
Priority to PCT/US2013/036779 priority patent/WO2013158642A1/en
Priority to JP2015505990A priority patent/JP2015516768A/en
Assigned to QUALCOMM INCORPORATED reassignment QUALCOMM INCORPORATED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHEN, JIANLE, CHIEN, WEI-JUNE, JOSHI, RAJAN LAXMAN, KARCZEWICZ, MARTA, SOLE ROJALS, JOEL
Assigned to QUALCOMM INCORPORATED reassignment QUALCOMM INCORPORATED CORRECTIVE ASSIGNMENT TO CORRECT THE SPELLING OF FIRST CONVEYING PARTY FROM WEI-JUNE CHIEN TO WEI-JUNG CHIEN. PREVIOUSLY RECORDED ON REEL 030413 FRAME 0600. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT.. Assignors: CHEN, JIANLE, CHIEN, WEI-JUNG, JOSHI, RAJAN LAXMAN, KARCZEWICZ, MARTA, SOLE ROJALS, JOEL
Publication of US20130272423A1 publication Critical patent/US20130272423A1/en
Priority to IL234708A priority patent/IL234708A0/en
Priority to PH12014502144A priority patent/PH12014502144A1/en
Priority to ZA2014/07860A priority patent/ZA201407860B/en
Priority to HK15101986.7A priority patent/HK1201661A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • H04N19/00775
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/129Scanning of coding units, e.g. zig-zag scan of transform coefficients or flexible macroblock ordering [FMO]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/18Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a set of transform coefficients
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • H03M7/4006Conversion to or from arithmetic code
    • H03M7/4012Binary arithmetic codes
    • H03M7/4018Context adapative binary arithmetic codes [CABAC]

Definitions

  • This disclosure relates to video coding and more particularly to techniques for coding syntax elements associated with transform coefficients, used in video coding.
  • Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones, so-called “smart phones,” video teleconferencing devices, video streaming devices, and the like.
  • Digital video devices implement video compression techniques defined according to video coding standards. Digital video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video compression techniques.
  • Video coding standards include ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC), including its Scalable Video Coding (SVC) and Multiview Video Coding (MVC) extensions.
  • High-Efficiency Video Coding (HEVC) is a video coding standard being developed by the Joint Collaboration Team on Video Coding (JCT-VC) of ITU-T Video Coding Experts Group (VCEG) and ISO/IEC Motion Picture Experts Group (MPEG).
  • Video compression techniques perform spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences.
  • a video slice i.e., a video frame or a portion of a video frame
  • video blocks which may also be referred to as treeblocks, coding units (CUs) and/or coding nodes.
  • Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture.
  • Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures.
  • Pictures may be referred to as frames, and reference pictures may be referred to a reference frames.
  • this disclosure describes techniques for encoding and decoding data representing syntax elements (e.g., significance flags) associated with transform coefficients of a block.
  • a video encoder and a video decoder each determines contexts to be used for context adaptive binary arithmetic coding (CABAC).
  • CABAC context adaptive binary arithmetic coding
  • the video encoder and the video decoder determine a scan order for the block, and determine the contexts based on the scan order.
  • the video decoder determines contexts that are the same for two or more scan orders, and different contexts for other scan orders.
  • the video encoder determines contexts that are the same for the two or more scan orders, and different contexts for the other scan orders.
  • the disclosure describes a method for encoding video data.
  • the method comprising determining a scan order for transform coefficients of a block, determining contexts for significance flags of the transform coefficients of the block based on the determined scan order, context adaptive binary arithmetic coding (CABAC) encoding the significance flags of the transform coefficients based at least on the determined contexts, and signaling the encoded significance flags in a coded bitstream.
  • CABAC context adaptive binary arithmetic coding
  • the disclosure describes an apparatus for coding video data.
  • the apparatus comprises a video coder configured to determine a scan order for transform coefficients of a block, determine contexts for significance flags of the transform coefficients of the block based on the determined scan order, and context adaptive binary arithmetic coding (CABAC) code the significance flags of the transform coefficients based at least on the determined contexts.
  • CABAC context adaptive binary arithmetic coding
  • the disclosure describes an apparatus for coding video data.
  • the apparatus comprises means for determining a scan order for transform coefficients of a block, means for determining contexts for significance flags of the transform coefficients of the block based on the determined scan order, and means for context adaptive binary arithmetic coding (CABAC) the significance flags of the transform coefficients based at least on the determined contexts.
  • CABAC context adaptive binary arithmetic coding
  • the disclosure describes a computer-readable storage medium.
  • the computer-readable storage medium having instructions stored thereon that when executed cause one or more processors of an apparatus for coding video data to determine a scan order for transform coefficients of a block, determine contexts for significance flags of the transform coefficients of the block based on the determined scan order, and context adaptive binary arithmetic coding (CABAC) code the significance flags of the transform coefficients based at least on the determined contexts.
  • CABAC context adaptive binary arithmetic coding
  • FIGS. 1A-1C are conceptual diagrams illustrating examples of scan orders of a block that includes transform coefficients.
  • FIG. 2 is a conceptual diagram illustrating a mapping of transform coefficients to significance syntax elements.
  • FIG. 3 is a block diagram illustrating an example video encoding and decoding system that may utilize techniques described in this disclosure.
  • FIG. 4 is a block diagram illustrating an example video encoder that may implement techniques described in this disclosure.
  • FIG. 5 is a block diagram illustrating an example of an entropy encoder that may implement techniques for entropy encoding syntax elements in accordance with this disclosure.
  • FIG. 6 is a flowchart illustrating an example process for encoding video data according to this disclosure.
  • FIG. 7 is a block diagram illustrating an example video decoder that may implement techniques described in this disclosure.
  • FIG. 8 is a block diagram illustrating an example of an entropy decoder that may implement techniques for decoding syntax elements in accordance with this disclosure.
  • FIG. 9 is a flowchart illustrating an example process of decoding video data according to this disclosure.
  • FIG. 10 is a conceptual diagram illustrating positions of a last significant coefficient depending on the scan order.
  • FIG. 11 is a conceptual diagram illustrating use of a diagonal scan in place of an original horizontal scan.
  • FIG. 12 is a conceptual diagram illustrating a context neighborhood for a nominal horizontal scan.
  • a video encoder determines transform coefficients for a block, encodes syntax elements, that indicate the values of the transform coefficients, using context adaptive binary arithmetic coding (CABAC), and signals the encoded syntax elements in a bitstream.
  • CABAC context adaptive binary arithmetic coding
  • a video decoder receives the bitstream that includes the encoded syntax elements that indicate the values of the transform coefficients and CABAC decodes the syntax elements to determine the transform coefficients for the block.
  • the video encoder and video decoder determine which contexts are to be used to perform CABAC encoding and CABAC decoding, respectively.
  • the video encoder and the video decoder may determine which contexts to use to perform CABAC encoding or CABAC decoding based on a scan order of the block of the transform coefficients.
  • the video encoder and the video decoder may determine which contexts to use to perform CABAC encoding or CABAC decoding based on a size of the block, positions of the transform coefficients within the block, and the scan order.
  • the video encoder and the video decoder may utilize different contexts for different scan orders (i.e., a first set of contexts for horizontal scan, a second set of contexts for vertical scan, and a third set of contexts for diagonal scan).
  • a first set of contexts for horizontal scan e.g., a first set of contexts for horizontal scan
  • a second set of contexts for vertical scan e.g., a second set of contexts for vertical scan
  • a third set of contexts for diagonal scan e.g., if the block of transform coefficients is scanned vertically or horizontally, the video encoder and the video decoder may utilize the same contexts for both of these scan orders (e.g., for a particular position of a transform coefficient).
  • the techniques described in this disclosure may exploit the statistical behavior of the magnitudes of the transform coefficients in a way that achieves better video compression, as compared to other techniques. For instance, it may be possible for the video encoder and the video decoder to determine which contexts to use for CABAC encoding or CABAC decoding based on the position of the transform coefficient, irrespective of the scan order. However, the scan order may have an effect on the ordering of the transform coefficients.
  • the block of transform coefficients may be a two-dimensional (2D) block of coefficients that the video encoder scans to construct a one-dimensional (1D) vector, and the video encoder entropy encodes (using CABAC) the values of the transform coefficients in the 1D vector.
  • the order in which the video encoder places the values (e.g., magnitudes) of the transform coefficients in the 1D vector is a function of the scan order.
  • the order in which the video encoder places the magnitudes of the transform coefficients for a diagonal scan may be different than the order in which the video encoder places the magnitudes of the transform coefficients for a vertical scan.
  • the position of the magnitudes of the transform coefficients may be different for different scan orders.
  • the position of the magnitudes of the transform coefficients may have an effect on coding efficiency.
  • the location of the last significant coefficient, in the block may be different for different scan orders.
  • the magnitude of the last significant coefficient may be different for different scan orders.
  • these other techniques that determine contexts based on the position of the transform coefficient irrespective to the scan order fail to properly account for the potential that the significance statistics for a transform coefficient in a particular position may vary depending on the scan order.
  • the video encoder and video decoder may determine the scan order for the block, and determine contexts based on the determined scan order (and in some examples, also based on the positions of the transform coefficients and possibly the size of the block). This way, the video encoder and video decoder may better account for the significance statistics for determining which contexts to use as compared to techniques that do not rely on the scan order and rely only on the position for determining which contexts to use.
  • the video encoder and the video decoder may use five coding passes to encode or decode transform coefficients of a block, namely, (1) a significance pass, (2) a greater than one pass, (3) a greater than two pass, (4) a sign pass, and (5) a coefficient level remaining pass.
  • significance coding refers to generating syntax elements to indicate whether any of the coefficients within the block have an absolute value of one or greater. That is, a coefficient with an absolute value of one or greater is considered “significant.”
  • significance coding refers to generating syntax elements to indicate whether any of the coefficients within the block have an absolute value of one or greater. That is, a coefficient with an absolute value of one or greater is considered “significant.”
  • the other coding passes are described in more detail below.
  • the video encoder determines syntax elements that indicate whether a transform coefficient is significant. Syntax elements that indicate whether a transform coefficient is significant are referred to herein as significance syntax elements.
  • significance syntax elements One example of a significance syntax element is a significance flag, where a value of 0 for the significance flag indicates that the coefficient is not significant (i.e., the value of the transform coefficient is 0) and a value of 1 for the significance flag indicates that the coefficient is significant (i.e., the value of the transform coefficient is non-zero).
  • the video encoder scans the transform coefficients of a block or part of the block (if the position of the last significant position is previously determined and signaled to the decoder), and determines the significance syntax element for each transform coefficient.
  • the scan order such as a horizontal scan, a vertical scan, and a diagonal scan.
  • the video encoder CABAC encodes the significance syntax elements and signals the encoded significance syntax elements in a coded bitstream.
  • Other types of scans, such as zig-zag scans, adaptive or partially adaptive scans may also be used in some examples.
  • binarization may be applied to a syntax element to form a series of one or more bits, which are referred to as “bins.”
  • a coding context may be associated with a bin of the syntax element.
  • the coding context may identify probabilities of coding bins having particular values. For instance, a coding context may indicate a 0.7 probability of coding a O-valued bin (representing an example of a “most probable symbol,” in this instance) and a 0.3 probability of coding a 1-valued bin.
  • a bin may be arithmetically coded based on the context.
  • contexts associated with a particular syntax element or bins thereof may be dependent on other syntax elements or coding parameters.
  • the video encoder may determine which contexts to use for the CABAC encoding based on the scan order.
  • the video encoder may use one set of contexts per scan order type. For example, if the block is a 4 ⁇ 4 block, there are sixteen coefficients.
  • the video encoder may utilize sixteen contexts for each scan resulting in a total of forty-eight contexts (i.e., sixteen contexts for horizontal scan, sixteen contexts for vertical scan, and sixteen contexts for diagonal scan for a total of forty-eight contexts).
  • the video decoder receives the coded bitstream (e.g., from the video encoder directly or via a storage medium that stores the coded bitstream) and performs a reciprocal function, as that of the video encoder, to determine the values of the transform coefficients. For example, the video decoder implements the significance pass to determine which transform coefficients are significant based on the significance syntax elements in the received bitstream.
  • the video decoder may determine the scan order of the transform coefficients of the block (e.g., the scan order in which the transform coefficients were scanned).
  • the video decoder may determine which contexts to use for CABAC decoding the significance syntax elements based on the scan order (e.g., sixteen of the forty-eight contexts for a 4 ⁇ 4 block or sixty-four of the 192 contexts for an 8 ⁇ 8 block). In this manner, the video decoder may select the same contexts for CABAC decoding that video encoder selected for CABAC encoding.
  • the video decoder CABAC decodes the significance syntax elements based on the determined contexts.
  • the video encoder and the video decoder determined contexts based on the scan order, where the contexts were different for different scan orders resulting in a total of forty-eight contexts for a 4 ⁇ 4 block and 192 contexts for an 8 ⁇ 8 block.
  • the techniques described in this disclosure are not limited in this respect.
  • the contexts that the video encoder and the video decoder use may be the same contexts for multiple (i.e., two or more) scan orders to allow for context sharing depending on scan order type.
  • the video encoder and the video decoder may determine contexts that are the same if the scan order is a horizontal scan or if the scan order is a vertical scan. In other words, the contexts are the same if the scan order is the horizontal scan or if the scan order is the vertical scan for a particular position of the transform coefficient within the block.
  • the video encoder and the video decoder may utilize different contexts for the diagonal scan. In this example, the number of contexts for the 4 ⁇ 4 block reduces from forty-eight contexts to thirty-two contexts and for the 8 ⁇ 8 block reduces from 192 contexts to 128 because the contexts for the horizontal scan and the vertical scan are the same, and there are different contexts for the diagonal scan.
  • the video encoder and the video decoder may use the same contexts for all scan order types, which reduces the contexts to sixteen for the 4 ⁇ 4 block and sixty-four for the 8 ⁇ 8 block.
  • using the same contexts for all scan order types may be a function of the block size.
  • the contexts may be different for the different scan orders, or two or more of the scan orders may share contexts.
  • the contexts for the horizontal and vertical scans may be the same (e.g., for a particular position), and different for the diagonal scan.
  • the contexts may be different for different scan orders.
  • the contexts for the 2D block and the 1D block may be different.
  • the contexts for the 2D block or the 1D block may be the same.
  • the video encoder and the video decoder may account for the size of the block. For instance, in the above example, the size of the block indicated whether all scan orders share contexts. In some examples, the video encoder and the video decoder may determine which contexts to use based on the size of the block and the scan order. In these examples, the techniques described in this disclosure may allow for context sharing. For instance, for a block with a first size, the video encoder and the video decoder may determine contexts that are the same if the block of the first size is scanned horizontally or if the block of the first size is scanned vertically. For a block with a second size, the video encoder and the video decoder may determine contexts that are the same if the block of the second size is scanned horizontally or if the block of the second size is scanned vertically.
  • the video encoder and the video decoder determine a first set of contexts that are used for CABAC encoding or CABAC decoding for all scan orders. For certain sized blocks (e.g., 8 ⁇ 8), the video encoder and the video decoder determines a second set of contexts that are used for CABAC encoding or CABAC decoding for a diagonal scan, and a third set of contexts that are used for CABAC encoding or CABAC decoding for both a horizontal scan and a vertical scan. For certain sized blocks (e.g., 4 ⁇ 4), the video encoder and the video decoder determine a fourth set of contexts that are used for CABAC encoding or CABAC decoding for a diagonal scan, a horizontal scan and a vertical scan.
  • a first set of contexts that are used for CABAC encoding or CABAC decoding for all scan orders.
  • the video encoder and the video decoder determines a second set of contexts that are used for CABAC encoding or CABAC decoding for a
  • the examples of determining contexts based on the scan order may be directed to intra-coding modes.
  • the transform coefficients may be the result from intra-coding, and the techniques described in this disclosure may be applicable to such transform coefficients.
  • the techniques described in this disclosure are not so limited and may be applicable for inter-coding or intra-coding.
  • FIGS. 1A-1C are conceptual diagrams illustrating examples of scan orders of a block that includes transform coefficients.
  • a block that includes transform coefficients may be referred to as a transform block (TB).
  • a transform block may be a block of a transform unit.
  • a transform unit includes three transform blocks and the corresponding syntax elements.
  • a transform unit may be a transform block of luma samples of size 8 ⁇ 8, 16 ⁇ 16, or 32 ⁇ 32 or four transform blocks of luma samples of size 4 ⁇ 4, two corresponding transform blocks of chroma samples of a picture that three sample arrays, or a transform block of luma samples of size 8 ⁇ 8, 16 ⁇ 16, or 32 ⁇ 32, or four transform blocks of luma samples of size 4 ⁇ 4 or a monochrome picture or a picture that is coded using separate color planes and syntax structures used to transform the transform block samples.
  • FIG. 1A illustrates a horizontal scan of 4 ⁇ 4 block 10 (e.g., TB 10 ) that includes transform coefficients 12 A to 12 P (collectively referred to as “transform coefficients 12 ”).
  • the horizontal scan starts from transform coefficient 12 P and ends at transform coefficient 12 A, and proceeds horizontally through the transform coefficients.
  • FIG. 1B illustrates a vertical scan of 4 ⁇ 4 block 14 (e.g., TB 14 ) that includes transform coefficients 16 A to 16 P (collectively referred to as “transform coefficients 16 ”).
  • the vertical scan starts from transform coefficient 16 P and ends at transform coefficient 16 A, and proceeds vertically through the transform coefficients.
  • FIG. 1C illustrates a diagonal scan of 4 ⁇ 4 block 18 (e.g., TB 18 ) that includes transform coefficients 20 A to 20 P (collectively referred to as “transform coefficients 20 ”).
  • the diagonal scan starts from transform coefficient 20 P and ends at transform coefficient 20 A, and proceeds diagonally through the transform coefficients.
  • the video encoder may determine the location of the last significant coefficient (e.g., the last transform coefficient with a non-zero value) in the block.
  • the video encoder may scan starting from the last significant coefficient and ending on the first transform coefficient.
  • the video encoder may signal the location of the last significant coefficient in the coded bitstream (i.e., x and y coordinate of the last significant coefficient), and the video decoder may receive the location of the last significant coefficient from the coded bitstream. In this manner, the video decoder may determine that subsequent syntax elements for the transform coefficients (e.g., the significance syntax elements) are for transform coefficients starting from the last significant coefficient and ending on the first transform coefficient.
  • subsequent syntax elements for the transform coefficients e.g., the significance syntax elements
  • FIGS. 1A-1C are illustrated as 4 ⁇ 4 blocks, the techniques described in this disclosure are not so limited, and the techniques can be extended to other sized blocks.
  • one or more of 4 ⁇ 4 blocks 10 , 14 , and 18 may be sub-blocks of a larger block.
  • an 8 ⁇ 8 block can be divided into four 4 ⁇ 4 sub-blocks
  • a 16 ⁇ 16 can be divided into sixteen 4 ⁇ 4 sub-blocks, and so forth
  • one or more of 4 ⁇ 4 blocks 10 , 14 , and 18 may be sub-blocks of the 8 ⁇ 8 block or 16 ⁇ 16 block. Examples of sub-block horizontal and vertical scans are described in: (1) Rosewarne, C., Maeda, M.
  • Non-CE11 Harmonisation of 8 ⁇ 8 TU residual scan
  • JCT-VC Contribution JCTVC-H0145 Yu, Y., Panusopone, K., Lou, J., Wang, L.
  • Transform coefficients 12 , 16 , and 20 represent transformed residual values between a block that is being predicted and another block.
  • the video encoder generates significance syntax elements that indicate whether the values of transform coefficients 12 , 16 , and 20 are zero or non-zero, encodes the significance syntax elements, and signals the encoded significance syntax elements in a coded bitstream.
  • the video decoder receives the coded bitstream and decodes the significance syntax elements as part of the process of determining transform coefficients 12 , 16 , and 20 .
  • the video encoder and the video decoder determine contexts that are to be used for context adaptive binary arithmetic coding (CABAC) encoding and decoding.
  • CABAC context adaptive binary arithmetic coding
  • the video encoder and the video decoder account for the scan order.
  • the video encoder and the video decoder may determine a first set of contexts for the sixteen transform coefficients 12 of TU 10 . If the video encoder and the video decoder determine that the scan order in a vertical scan, then the video encoder and the video decoder may determine a second set of contexts for the sixteen transform coefficients 16 of TU 14 . If the video encoder and the video decoder determine that the scan order is a diagonal scan, then the video encoder and the video decoder may determine a third set of contexts for the sixteen transform coefficients 20 of TU 18 .
  • two or more scan orders may share contexts.
  • two or more of the first set of contexts, second set of contexts, and the third set of contexts may be the same set of contexts.
  • the first set of contexts for the horizontal scan may be the same as the second set of contexts for the vertical scan.
  • the first, second, and third contexts may be the same set of contexts.
  • the video encoder and the video decoder determine from a first, second, and third set of contexts the contexts to use for CABAC encoding and decoding based on the scan order. In some examples, the video encoder and the video decoder determine which contexts to use for CABAC encoding and decoding based on the scan order and a size of the block.
  • the video encoder and the video decoder determine contexts from a fourth, fifth, and sixth set of contexts (one for each scan order) based on the scan order. If the block is 16 ⁇ 16, then the video encoder and the video decoder determine contexts from a seventh, eighth, and ninth set of contexts (one for each scan order) based on the scan order, and so forth. Similar to above, in some examples, there may be context sharing for the different sized blocks.
  • the video encoder and video decoder determine contexts that are the same for all scan orders, but for an 8 ⁇ 8 sized block, the video encoder and the video determine the contexts that are the same for a horizontal scan and a vertical scan (e.g., for transform coefficients in particular positions), and different contexts for the diagonal scan.
  • the video encoder and the video decoder may determine contexts that are the same for all scan orders and for both sizes. In some examples, for the 16 ⁇ 16 and 32 ⁇ 32 blocks, horizontal and vertical scans may not be applied. Other such permutations and combinations are possible, and are contemplated by this disclosure.
  • the scan order defines the arrangement of the transform coefficients.
  • the magnitude of the first transform coefficient (referred to as the DC coefficient) is generally the highest.
  • the magnitude of the second transform coefficient is the next highest (on average, but not necessarily), and so forth.
  • the location of the second transform coefficient is based on the scan order.
  • the second transform coefficient is the transform coefficient immediately to the right of the first transform coefficient (i.e., immediately right of transform coefficient 12 A).
  • the second transform coefficient is the transform coefficient immediately below the first transform coefficient (i.e., immediately below transform coefficient 16 A in FIG. 1B and immediately below transform coefficient 20 A in FIG. 1C ).
  • the significance statistics for a transform coefficient in a particular scan position may vary depending on the scan order. For example, in FIG. 1A , for the horizontal scan, the last transform coefficient in the first row may have much higher magnitude (on average) compared to the same transform coefficient in the vertical scan of FIG. 1B or the diagonal scan of FIG. 1C .
  • the context is based on the location of the transform coefficient, irrespective of the actual scan order (i.e., position based contexts for 4 ⁇ 4 and 8 ⁇ 8 blocks do not distinguish between the various scans).
  • the context for a transform coefficient located at (i, j) in the block is the same for the horizontal, vertical, and diagonal scans.
  • the scan order may have an effect on the significance statistics for the transform coefficients, and the techniques described in this disclosure may determine contexts based on the scan order to account for the significance statistics.
  • the video encoder and the video decoder may determine contexts that are the same for two or more scan orders.
  • the video encoder and the video decoder may determine contexts that are the same for two or more scan orders for particular locations of transform coefficients.
  • the horizontal and the vertical scan orders share the contexts for a particular block size by sharing contexts between the horizontal scan and a transpose of the block of the vertical scan.
  • the video encoder and the video decoder may determine the same context for a transform coefficient (i, j) for the horizontal scan and a transform coefficient (j, i) for a vertical scan for a particular block size.
  • the contexts for the fourth (last) row of the block, for the horizontal scan may be same as the contexts for the fourth (last) column of the block, for the vertical scan
  • the contexts for the third row of the block, for the horizontal scan may be the same the contexts for the third column of the block, for the vertical scan
  • the contexts for the second row of the block, for the horizontal scan may be the same the contexts for the second column of the block, for the vertical scan
  • the contexts for the first row of the block, for the horizontal scan may be the same the contexts for the first column of the block, for the vertical scan.
  • the same may be applied to 8 ⁇ 8 blocks.
  • contexts may be shared between different block sizes (e.g., shared between a 4 ⁇ 4 block and an 8 ⁇ 8 block).
  • the context for transform coefficient (1, 1) in a 4 ⁇ 4 block and the context for transform coefficients (2, 2), (2, 3), (3, 2), and (3, 3) in an 8 ⁇ 8 block may be the same, and in some examples, may be the same for a particular scan order.
  • FIG. 2 is a conceptual diagram illustrating a mapping of transform coefficients to significance syntax elements.
  • the left side of FIG. 2 illustrates transform coefficients values and the right side of FIG. 2 illustrates corresponding significance syntax elements.
  • significance syntax element e.g., significance flag
  • significance syntax element e.g., significance flag
  • the video encoder and the video decoder are configured to CABAC encode and CABAC decode the example significance syntax elements illustrated in FIG. 2 by determining contexts based on the scan order, and in some examples, also based on positions of the transform coefficients and the size of the block.
  • FIG. 3 is a block diagram illustrating an example video encoding and decoding system 22 that may be configured to assign contexts utilizing the techniques described in this disclosure.
  • system 22 includes a source device 24 that generates encoded video data to be decoded at a later time by a destination device 26 .
  • Source device 24 and destination device 26 may comprise any of a wide range of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called “smart” phones, so-called “smart” pads, televisions, cameras, display devices, digital media players, video gaming consoles, video streaming device, or the like.
  • source device 24 and destination device 26 may be equipped for wireless communication.
  • Link 28 may comprise any type of medium or device capable of moving the encoded video data from source device 24 to destination device 26 .
  • link 28 may comprise a communication medium to enable source device 24 to transmit encoded video data directly to destination device 26 in real-time.
  • the encoded video data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to destination device 26 .
  • the communication medium may comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines.
  • the communication medium may form part of a packet-based network, such as a local area network, a wide-area network, or a global network such as the Internet.
  • the communication medium may include routers, switches, base stations, or any other equipment that may be useful to facilitate communication from source device 24 to destination device 26 .
  • encoded data may be output from output interface 34 to a storage device 38 .
  • encoded data may be accessed from storage device 38 by input interface 40 .
  • Storage device 38 may include any of a variety of distributed or locally accessed data storage media such as a hard drive, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storing encoded video data.
  • storage device 38 may correspond to a file server or another intermediate storage device that may hold the encoded video generated by source device 24 .
  • Destination device 26 may access stored video data from storage device 38 via streaming or download.
  • the file server may be any type of server capable of storing encoded video data and transmitting that encoded video data to the destination device 26 .
  • Example file servers include a web server (e.g., for a website), an FTP server, network attached storage (NAS) devices, or a local disk drive.
  • Destination device 26 may access the encoded video data through any standard data connection, including an Internet connection. This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both that is suitable for accessing encoded video data stored on a file server.
  • the transmission of encoded video data from storage device 38 may be a streaming transmission, a download transmission, or a combination of both.
  • the techniques of this disclosure are not necessarily limited to wireless applications or settings.
  • the techniques may be applied to video coding in support of any of a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, streaming video transmissions, e.g., via the Internet, encoding of digital video for storage on a data storage medium, decoding of digital video stored on a data storage medium, or other applications.
  • system 22 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.
  • source device 24 includes a video source 30 , video encoder 32 and an output interface 34 .
  • output interface 34 may include a modulator/demodulator (modem) and/or a transmitter.
  • video source 30 may include a source such as a video capture device, e.g., a video camera, a video archive containing previously captured video, a video feed interface to receive video from a video content provider, and/or a computer graphics system for generating computer graphics data as the source video, or a combination of such sources.
  • a video capture device e.g., a video camera, a video archive containing previously captured video, a video feed interface to receive video from a video content provider, and/or a computer graphics system for generating computer graphics data as the source video, or a combination of such sources.
  • source device 24 and destination device 26 may form so-called camera phones or video phones.
  • the techniques described in this disclosure may be applicable to video coding in general, and may be applied to wireless and/or wired applications.
  • the captured, pre-captured, or computer-generated video may be encoded by video encoder 32 .
  • the encoded video data may be transmitted directly to destination device 26 via output interface 34 of source device 24 .
  • the encoded video data may also (or alternatively) be stored onto storage device 38 for later access by destination device 26 or other devices, for decoding and/or playback.
  • Destination device 26 includes an input interface 40 , a video decoder 42 , and a display device 44 .
  • input interface 40 may include a receiver and/or a modem.
  • Input interface 40 of destination device 26 receives the encoded video data over link 28 .
  • the encoded video data communicated over link 28 may include a variety of syntax elements generated by video encoder 32 for use by a video decoder, such as video decoder 42 , in decoding the video data.
  • Such syntax elements may be included with the encoded video data transmitted on a communication medium, stored on a storage medium, or stored a file server.
  • Display device 44 may be integrated with, or external to, destination device 26 .
  • destination device 26 may include an integrated display device and also be configured to interface with an external display device.
  • destination device 26 may be a display device.
  • display device 44 displays the decoded video data to a user, and may comprise any of a variety of display devices such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device.
  • LCD liquid crystal display
  • OLED organic light emitting diode
  • Video encoder 32 and video decoder 42 may operate according to a video compression standard, such as the ITU-T H.264 standard, alternatively referred to as MPEG-4, Part 10, Advanced Video Coding (AVC), or extensions of such standards.
  • video encoder 32 and video decoder 42 may operate according to other proprietary or industry standards, such as the High Efficiency Video Coding (HEVC) standard, and may conform to the HEVC Test Model (HM).
  • HEVC High Efficiency Video Coding
  • HM HEVC Test Model
  • the techniques of this disclosure are not limited to any particular coding standard.
  • Other examples of video compression standards include MPEG-2 and ITU-T H.263.
  • video encoder 32 and video decoder 42 may each be integrated with an audio encoder and decoder, and may include appropriate MUX-DEMUX units, or other hardware and software, to handle encoding of both audio and video in a common data stream or separate data streams. If applicable, in some examples, MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol, or other protocols such as the user datagram protocol (UDP).
  • MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol, or other protocols such as the user datagram protocol (UDP).
  • Video encoder 32 and video decoder 42 each may be implemented as any of a variety of suitable encoder circuitry, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable gate arrays
  • a device may store instructions for the software in a suitable, computer-readable storage medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure.
  • Each of video encoder 32 and video decoder 42 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in a respective device.
  • the device that includes video decoder 42 may be microprocessor, an integrated circuit (IC), or a wireless communication device that includes video decoder 42 .
  • the JCT-VC is working on development of the HEVC standard.
  • the HEVC standardization efforts are based on an evolving model of a video coding device referred to as the HEVC Test Model (HM).
  • HM presumes several additional capabilities of video coding devices relative to existing devices according to, e.g., ITU-T H.264/AVC. For example, whereas H.264 provides nine intra-prediction encoding modes, the HM may provide as many as thirty-five intra-prediction encoding modes.
  • the working model of the HM describes that a video frame or picture may be divided into a sequence of treeblocks or largest coding units (LCU) that include both luma and chroma samples.
  • a treeblock has a similar purpose as a macroblock of the H.264 standard.
  • a slice includes a number of consecutive treeblocks in coding order.
  • a video frame or picture may be partitioned into one or more slices.
  • Each treeblock may be split into coding units (CUs) according to a quadtree. For example, a treeblock, as a root node of the quadtree, may be split into four child nodes, and each child node may in turn be a parent node and be split into another four child nodes.
  • a final, unsplit child node, as a leaf node of the quadtree, comprises a coding node, i.e., a coded video block.
  • Syntax data associated with a coded bitstream may define a maximum number of times a treeblock may be split, and may also define a minimum size of the coding nodes.
  • a CU includes a coding node and prediction units (PUs) and transform units (TUs) associated with the coding node.
  • a transform unit includes one or more transform blocks, and the techniques described in this disclosure are related to determining contexts for the significance syntax elements for the transform coefficients of a transform block based on a scan order and, in some examples, based on a scan order and size of the transform block.
  • a size of the CU corresponds to a size of the coding node and must be square in shape. The size of the CU may range from 8 ⁇ 8 pixels up to the size of the treeblock with a maximum of 64 ⁇ 64 pixels or greater.
  • Each CU may contain one or more PUs and one or more TUs.
  • Syntax data associated with a CU may describe, for example, partitioning of the CU into one or more PUs. Partitioning modes may differ between whether the CU is skip or direct mode encoded, intra-prediction mode encoded, or inter-prediction mode encoded. PUs may be partitioned to be non-square in shape. Syntax data associated with a CU may also describe, for example, partitioning of the CU into one or more TUs according to a quadtree.
  • a TU can be square or non-square in shape.
  • a TU includes one or more transform blocks (TBs) (e.g., one TB for the luma samples, one TB for the first chroma samples, and one TB for the second chroma samples).
  • TBs transform blocks
  • a TU can be considered conceptually as including these TBs, and these TBs can be square or non-square in shape.
  • the term TU is used to generically refer to the TBs, and the example techniques described in this disclosure are described with respect to a TB.
  • the HEVC standard allows for transformations according to TUs, which may be different for different CUs.
  • the TUs are typically sized based on the size of PUs within a given CU defined for a partitioned LCU, although this may not always be the case.
  • the TUs are typically the same size or smaller than the PUs.
  • residual samples corresponding to a CU may be subdivided into smaller units using a quadtree structure known as “residual quad tree” (RQT).
  • RQT residual quad tree
  • the leaf nodes of the RQT may be referred to as transform units (TUs).
  • Pixel difference values associated with the TUs may be transformed to produce transform coefficients, which may be quantized.
  • a PU includes data related to the prediction process.
  • the PU when the PU is intra-mode encoded (intra-prediction encoded), the PU may include data describing an intra-prediction mode for the PU.
  • the PU when the PU is inter-mode encoded (inter-prediction encoded), the PU may include data defining a motion vector for the PU.
  • the data defining the motion vector for a PU may describe, for example, a horizontal component of the motion vector, a vertical component of the motion vector, a resolution for the motion vector (e.g., one-quarter pixel precision or one-eighth pixel precision), a reference picture to which the motion vector points, and/or a reference picture list (e.g., List 0 (L0) or List 1 (L1)) for the motion vector.
  • a TU is used for the transform and quantization processes.
  • a given CU having one or more PUs may also include one or more transform units (TUs).
  • the TUs include one or more transform blocks (TBs).
  • Blocks 10 , 14 , and 18 of FIGS. 1A-1C are examples of TBs.
  • video encoder 32 may calculate residual values corresponding to the PU.
  • the residual values comprise pixel difference values that may be transformed into transform coefficients, quantized, and scanned using the TBs to produce serialized transform coefficients for entropy coding.
  • This disclosure typically uses the term “video block” to refer to a coding node of a CU.
  • video block may also use the term “video block” to refer to a treeblock, i.e., LCU, or a CU, which includes a coding node and PUs.
  • video block may also refer to transform blocks of a TU.
  • a video picture may be partitioned into coding units (CUs), prediction units (PUs), and transform units (TUs).
  • CUs coding units
  • PUs prediction units
  • TUs transform units
  • a CU generally refers to an image region that serves as a basic unit to which various coding tools are applied for video compression.
  • a CU typically has a square geometry, and may be considered to be similar to a so-called “macroblock” under other video coding standards, such as, for example, ITU-T H.264.
  • a CU may have a variable size depending on the video data it contains. That is, a CU may be partitioned, or “split” into smaller blocks, or sub-CUs, each of which may also be referred to as a CU. In addition, each CU that is not split into sub-CUs may be further partitioned into one or more PUs and TUs for purposes of prediction and transform of the CU, respectively.
  • PUs may be considered to be similar to so-called partitions of a block under other video coding standards, such as H.264.
  • PUs are the basis on which prediction for the block is performed to produce “residual” coefficients.
  • Residual coefficients of a CU represent a difference between video data of the CU and predicted data for the CU determined using one or more PUs of the CU.
  • the one or more PUs specify how the CU is partitioned for the purpose of prediction, and which prediction mode is used to predict the video data contained within each partition of the CU.
  • One or more TUs of a CU specify partitions of a block of residual coefficients of the CU on the basis of which a transform is applied to the block to produce a block of residual transform coefficients for the CU.
  • the one or more TUs may also be associated with the type of transform that is applied.
  • the transform converts the residual coefficients from a pixel, or spatial domain to a transform domain, such as a frequency domain.
  • the one or more TUs may specify parameters on the basis of which quantization is applied to the resulting block of residual transform coefficients to produce a block of quantized residual transform coefficients.
  • the residual transform coefficients may be quantized to possibly reduce the amount of data used to represent the coefficients.
  • a CU generally includes one luminance component, denoted as Y, and two chrominance components, denoted as U and V.
  • a given CU that is not further split into sub-CUs may include Y, U, and V components, each of which may be further partitioned into one or more PUs and TUs for purposes of prediction and transform of the CU, as previously described.
  • the size of the U and V components in terms of a number of samples, may be the same as or different than the size of the Y component.
  • the techniques described above with reference to prediction, transform, and quantization may be performed for each of the Y, U, and V components of a given CU.
  • one or more predictors for the CU are first derived based on one or more PUs of the CU.
  • a predictor is a reference block that contains predicted data for the CU, and is derived on the basis of a corresponding PU for the CU, as previously described.
  • the PU indicates a partition of the CU for which predicted data is to be determined, and a prediction mode used to determine the predicted data.
  • the predictor can be derived either through intra-(I) prediction (i.e., spatial prediction) or inter-(P or B) prediction (i.e., temporal prediction) modes.
  • some CUs may be intra-coded (I) using spatial prediction with respect to neighboring reference blocks, or CUs, in the same frame, while other CUs may be inter-coded (P or B) with respect to reference blocks, or CUs, in other frames.
  • a difference between the original video data of the CU corresponding to the one or more PUs and the predicted data for the CU contained in the one or more predictors is calculated.
  • This difference also referred to as a prediction residual, comprises residual coefficients, and refers to pixel differences between portions of the CU specified by the one or more PUs and the one or more predictors, as previously described.
  • the residual coefficients are generally arranged in a two-dimensional (2-D) array that corresponds to the one or more PUs o the CU.
  • the prediction residual is generally transformed, e.g., using a discrete cosine transform (DCT), integer transform, Karhunen-Loeve (K-L) transform, or another transform.
  • DCT discrete cosine transform
  • K-L Karhunen-Loeve
  • the transform converts the prediction residual, i.e., the residual coefficients, in the spatial domain to residual transform coefficients in the transform domain, e.g., a frequency domain, as also previously described.
  • the transform is skipped, i.e., no transform is applied to the prediction residual.
  • Transform skipped coefficients are also referred as transform coefficients.
  • the transform coefficients (including transform skip coefficients) are also generally arranged in a 2-D array that corresponds to the one or more TUs of the CU.
  • the residual transform coefficients may be quantized to possibly reduce the amount of data used to represent the coefficients, as also previously described.
  • an entropy coder subsequently encodes the resulting residual transform coefficients, using Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), Probability Interval Partitioning Entropy Coding (PIPE), or another entropy coding methodology.
  • CAVLC Context Adaptive Variable Length Coding
  • CABAC Context Adaptive Binary Arithmetic Coding
  • PIPE Probability Interval Partitioning Entropy Coding
  • Entropy coding may achieve this further compression by reducing or removing statistical redundancy inherent in the video data of the CU, represented by the coefficients, relative to other CUs.
  • a video sequence typically includes a series of video frames or pictures.
  • a group of pictures generally comprises a series of one or more of the video pictures.
  • a GOP may include syntax data in a header of the GOP, a header of one or more of the pictures, or elsewhere, that describes a number of pictures included in the GOP.
  • Each slice of a picture may include slice syntax data that describes an encoding mode for the respective slice.
  • Video encoder 32 typically operates on video blocks within individual video slices in order to encode the video data.
  • a video block may correspond to a coding node within a CU (e.g., a transform block of transform coefficients).
  • the video blocks may have fixed or varying sizes, and may differ in size according to a specified coding standard.
  • the HM supports prediction in various PU sizes. Assuming that the size of a particular CU is 2N ⁇ 2N, the HM supports intra-prediction in PU sizes of 2N ⁇ 2N or N ⁇ N, and inter-prediction in symmetric PU sizes of 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, or N ⁇ N. The HM also supports asymmetric partitioning for inter-prediction in PU sizes of 2N ⁇ nU, 2N ⁇ nD, nL ⁇ 2N, and nR ⁇ 2N. In asymmetric partitioning, one direction of a CU is not partitioned, while the other direction is partitioned into 25% and 75%.
  • the portion of the CU corresponding to the 25% partition is indicated by an “n” followed by an indication of “Up”, “Down,” “Left,” or “Right.”
  • “2N ⁇ nU” refers to a 2N ⁇ 2N CU that is partitioned horizontally with a 2N ⁇ 0.5N PU on top and a 2N ⁇ 1.5N PU on bottom.
  • N ⁇ N and N by N may be used interchangeably to refer to the pixel dimensions of a video block in terms of vertical and horizontal dimensions, e.g., 16 ⁇ 16 pixels or 16 by 16 pixels.
  • an N ⁇ N block generally has N pixels in a vertical direction and N pixels in a horizontal direction, where N represents a nonnegative integer value.
  • the pixels in a block may be arranged in rows and columns.
  • blocks need not necessarily have the same number of pixels in the horizontal direction as in the vertical direction.
  • blocks may comprise N ⁇ M pixels, where M is not necessarily equal to N.
  • video encoder 32 may calculate residual data for the TUs of the CU.
  • the PUs may comprise pixel data in the spatial domain (also referred to as the pixel domain) and the TUs may comprise coefficients in the transform domain following application of a transform, e.g., a discrete cosine transform (DCT), an integer transform, a wavelet transform, skip transform, or a conceptually similar transform to residual video data.
  • the residual data may correspond to pixel differences between pixels of the unencoded picture and prediction values corresponding to the PUs.
  • Video encoder 32 may form the TUs including the residual data for the CU, and then transform the TUs to produce transform coefficients for the CU.
  • video encoder 32 may perform quantization of the transform coefficients.
  • Quantization generally refers to a process in which transform coefficients are quantized to possibly reduce the amount of data used to represent the coefficients, providing further compression.
  • the quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be rounded down to an m-bit value during quantization, where n is greater than m.
  • video encoder 32 may utilize a predefined scan order (e.g., horizontal, vertical, or diagonal) to scan the quantized transform coefficients to produce a serialized vector that can be entropy encoded.
  • video encoder 32 may perform an adaptive scan. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 32 may entropy encode the one-dimensional vector, e.g., according to context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), Probability Interval Partitioning Entropy (PIPE) coding or another entropy encoding methodology.
  • Video encoder 32 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 42 in decoding the video data.
  • video encoder 32 may assign a context within a context model to a symbol to be transmitted.
  • the context may relate to, for example, whether neighboring values of the symbol are non-zero or not.
  • video encoder 32 may select a variable length code for a symbol to be transmitted. Codewords in VLC may be constructed such that relatively shorter codes correspond to more probable symbols, while longer codes correspond to less probable symbols. In this way, the use of VLC may achieve a bit savings over, for example, using equal-length codewords for each symbol to be transmitted.
  • the probability determination may be based on a context assigned to the symbol.
  • Video decoder 42 may be configured to implement the reciprocal of the encoding techniques implemented by video encoder 32 . For example, for the encoded significance syntax elements, video decoder 42 may decode the significance syntax elements by determining which contexts to use based on the determined scan order.
  • video encoder 32 signals syntax elements that indicate the values of the transform coefficients. Video encoder 32 generates these syntax elements in five passes, as one example, and using five passes is not necessary in every example. Video encoder 32 determines the location of the last significant coefficient and begins the first pass from the last significant coefficient. After the first pass, video encoder 32 implements the remaining four passes only on those transform coefficients remaining from the previous pass. In the first pass, video encoder 32 scans the transform coefficients using one of the scan orders illustrated in FIGS. 1A-1C and determines a significance syntax element for each transform coefficient that indicates whether the value for the transform coefficient is zero or non-zero (i.e., insignificant or significant).
  • video encoder 32 In the fourth pass, referred to as a sign pass, video encoder 32 generates syntax elements to indicate the sign information for significant coefficients.
  • video encoder 32 In the fifth pass, referred to as a coefficient level remaining pass, video encoder 32 generates syntax elements that indicate the remaining absolute value of a transform coefficient level (e.g., the remainder value). The remainder value may be coded as the absolute value of the coefficient minus 3. It should be noted that the five pass approach is just one example technique that may be used for coding transform coefficient and the techniques described herein may be equally applicable to other techniques.
  • video encoder 32 encodes the significance syntax elements using context adaptive binary arithmetic coding (CABAC).
  • CABAC context adaptive binary arithmetic coding
  • video encoder 32 may determine a scan order for the transform coefficients of the block, and determine contexts for the significance syntax elements of the transform coefficients of the block based on the determined scan order.
  • Video encoder 32 may CABAC encode the significance syntax elements based on the determined contexts, and signal the encoded significance syntax elements in the coded bitstream.
  • Video decoder 42 may be configured to perform similar functions. For example, video decoder 42 receives from the coded bitstream significance syntax elements of transform coefficients of a block. Video decoder 42 may determine a scan order for the transform coefficients of the block (e.g., an order in which video encoder 32 scanned the transform coefficients). Video decoder 42 may then CABAC decode the significance syntax elements of the transform coefficients based at least one the determined contexts.
  • a scan order for the transform coefficients of the block e.g., an order in which video encoder 32 scanned the transform coefficients.
  • Video decoder 42 may then CABAC decode the significance syntax elements of the transform coefficients based at least one the determined contexts.
  • video encoder 32 and video decoder 42 each determines the contexts that are the same if the determined scan order is a horizontal scan and if the determined scan order is a vertical scan, and determines the contexts, which are different than the contexts for the horizontal scan and vertical scan, if the determined scan order is a diagonal scan.
  • video encoder 32 and video decoder 42 may each determine a first set of contexts for the significance syntax elements if the scan order is a first scan order, and determine a second set of contexts for the significance syntax elements if the scan order is a second scan order.
  • the first set of contexts and the second set of contexts may be same in some cases (e.g., where the first scan order is a horizontal scan and the second scan order is a vertical scan, or vice-versa).
  • the first set of contexts and the second set of contexts may be different in some cases (e.g., where the first scan order is either a horizontal or a vertical scan and the second scan order is not a horizontal or a vertical scan).
  • video encoder 32 and video decoder 42 also determine a size of the block. In some of these examples, video encoder 32 and video decoder 42 determine the contexts for the significance syntax elements based on the determined scan order and based on the determined size of the block. For example, to determine the contexts, video encoder 32 and video decoder 42 may determine, based on the size of the block, that the contexts for the significance syntax elements of the transform coefficients that are the same for all scan orders. In other words, for certain sized blocks, video encoder 32 and video decoder 42 may determine contexts that are the same for all scan orders.
  • the techniques described in this disclosure may build upon the concepts of sub-block horizontal and vertical scans, such as those described in: (1) Rosewarne, C., Maeda, M. “Non-CE11: Harmonisation of 8 ⁇ 8 TU residual scan” JCT-VC Contribution JCTVC-H0145; (2) Yu, Y., Panusopone, K., Lou, J., Wang, L. “Adaptive Scan for Large Blocks for HEVC; JCT-VC Contribution JCTVC-F569; and (3) U.S. patent application Ser. No. 13/551,458, filed Jul. 17, 2012.
  • the techniques described in this disclosure provide for improvement in the coding of significance syntax elements and harmonization across different scan orders and block (e.g., TU) sizes.
  • a 4 ⁇ 4 block may be a sub-block of a larger block.
  • relatively large sized blocks e.g., 16 ⁇ 16 or 32 ⁇ 32
  • video encoder 32 and video decoder 42 may be configured to determine the contexts for the 4 ⁇ 4 sub-blocks based on the scan order.
  • such techniques may be extendable to 8 ⁇ 8 sized blocks as well as for all scan orders (i.e., the 4 ⁇ 4 sub-blocks of the 8 ⁇ 8 block can be scanned horizontally, vertically, or diagonally).
  • Such techniques may also allow for context sharing between the different scan orders.
  • video encoder 32 and video decoder 42 determine contexts that are the same for all block sizes if the scan order is a diagonal scan (i.e., the contexts are shared for all of the TUs when using the diagonal scan). In this example, video encoder 32 and video decoder 42 may determine another set of contexts that are the same for the horizontal and vertical scan, which allows for context sharing depending on the scan order.
  • Other combinations and permutations of the sizes and the scan orders may be possible, and video encoder 32 and video decoder 42 may be configured to determine contexts that are the same for these various combinations and permutations of sizes and scan orders.
  • FIG. 4 is a block diagram illustrating an example video encoder 32 that may implement the techniques described in this disclosure.
  • video encoder 32 includes a mode select unit 46 , prediction processing unit 48 , reference picture memory 70 , summer 56 , transform processing unit 58 , quantization processing unit 60 , and entropy encoding unit 62 .
  • Prediction processing unit 48 includes motion estimation unit 50 , motion compensation unit 52 , and intra prediction unit 54 .
  • video encoder 32 also includes inverse quantization processing unit 64 , inverse transform processing unit 66 , and summer 68 .
  • a deblocking filter (not shown in FIG. 4 ) may also be included to filter block boundaries to remove blockiness artifacts from reconstructed video.
  • the deblocking filter would typically filter the output of summer 68 . Additional loop filters (in loop or post loop) may also be used in addition to the deblocking filter. It should be noted that prediction processing unit 48 and transform processing unit 58 should not be confused with PUs and TUs as described above.
  • video encoder 32 receives video data, and mode select unit 46 partitions the data into video blocks. This partitioning may also include partitioning into slices, tiles, or other larger units, as well as video block partitioning, e.g., according to a quadtree structure of LCUs and CUs.
  • Video encoder 32 generally illustrates the components that encode video blocks within a video slice to be encoded. A slice may be divided into multiple video blocks (and possibly into sets of video blocks referred to as tiles).
  • Prediction processing unit 48 may select one of a plurality of possible coding modes, such as one of a plurality of intra coding modes or one of a plurality of inter coding modes, for the current video block based on error results (e.g., coding rate and the level of distortion). Prediction processing unit 48 may provide the resulting intra- or inter-coded block to summer 56 to generate residual block data and to summer 68 to reconstruct the encoded block for use as a reference picture.
  • error results e.g., coding rate and the level of distortion
  • Intra prediction unit 54 within prediction processing unit 48 may perform intra-predictive coding of the current video block relative to one or more neighboring blocks in the same frame or slice as the current block to be coded to provide spatial compression.
  • Motion estimation unit 50 and motion compensation unit 52 within prediction processing unit 48 perform inter-predictive coding of the current video block relative to one or more predictive blocks in one or more reference pictures to provide temporal compression.
  • Motion estimation unit 50 may be configured to determine the inter-prediction mode for a video slice according to a predetermined pattern for a video sequence.
  • the predetermined pattern may designate video slices in the sequence as P slices or B slices.
  • Motion estimation unit 50 and motion compensation unit 52 may be highly integrated, but are illustrated separately for conceptual purposes.
  • Motion estimation, performed by motion estimation unit 50 is the process of generating motion vectors, which estimate motion for video blocks.
  • a motion vector for example, may indicate the displacement of a PU of a video block within a current video frame or picture relative to a predictive block within a reference picture.
  • a predictive block is a block that is found to closely match the PU of the video block to be coded in terms of pixel difference, which may be determined by sum of absolute difference (SAD), sum of square difference (SSD), or other difference metrics.
  • video encoder 32 may calculate values for sub-integer pixel positions of reference pictures stored in reference picture memory 70 . For example, video encoder 32 may interpolate values of one-quarter pixel positions, one-eighth pixel positions, or other fractional pixel positions of the reference picture. Therefore, motion estimation unit 50 may perform a motion search relative to the full pixel positions and fractional pixel positions and output a motion vector with fractional pixel precision.
  • Motion estimation unit 50 calculates a motion vector for a PU of a video block in an inter-coded slice by comparing the position of the PU to the position of a predictive block of a reference picture.
  • the reference picture may be selected from a first reference picture list (List 0) or a second reference picture list (List 1), each of which identify one or more reference pictures stored in reference picture memory 70 .
  • Motion estimation unit 50 sends the calculated motion vector to entropy encoding unit 62 and motion compensation unit 52 .
  • Motion compensation performed by motion compensation unit 52 may involve fetching or generating the predictive block based on the motion vector determined by motion estimation, possibly performing interpolations to sub-pixel precision.
  • motion compensation unit 52 may locate the predictive block to which the motion vector points in one of the reference picture lists.
  • Video encoder 32 forms a residual video block by subtracting pixel values of the predictive block from the pixel values of the current video block being coded, forming pixel difference values.
  • the pixel difference values form residual data for the block, and may include both luma and chroma difference components.
  • Summer 56 represents the component or components that perform this subtraction operation.
  • Motion compensation unit 52 may also generate syntax elements associated with the video blocks and the video slice for use by video decoder 42 in decoding the video blocks of the video slice.
  • Intra-prediction unit 54 may intra-predict a current block, as an alternative to the inter-prediction performed by motion estimation unit 50 and motion compensation unit 52 , as described above. In particular, intra-prediction unit 54 may determine an intra-prediction mode to use to encode a current block. In some examples, intra-prediction unit 54 may encode a current block using various intra-prediction modes, e.g., during separate encoding passes, and intra-prediction unit 54 (or mode select unit 46 , in some examples) may select an appropriate intra-prediction mode to use from the tested modes.
  • intra-prediction unit 54 may calculate rate-distortion values using a rate-distortion analysis for the various tested intra-prediction modes, and select the intra-prediction mode having the best rate-distortion characteristics among the tested modes.
  • Rate-distortion analysis generally determines an amount of distortion (or error) between an encoded block and an original, unencoded block that was encoded to produce the encoded block, as well as a bit rate (that is, a number of bits) used to produce the encoded block.
  • Intra-prediction unit 54 may calculate ratios from the distortions and rates for the various encoded blocks to determine which intra-prediction mode exhibits the best rate-distortion value for the block.
  • intra-prediction unit 54 may provide information indicative of the selected intra-prediction mode for the block to entropy encoding unit 62 .
  • Entropy encoding unit 62 may encode the information indicating the selected intra-prediction mode in accordance with the entropy techniques described herein.
  • video encoder 32 forms a residual video block by subtracting the predictive block from the current video block.
  • the residual video data in the residual block may be included in one or more TBs and applied to transform processing unit 58 .
  • Transform processing unit 58 may transform the residual video data into residual transform coefficients using a transform, such as a discrete cosine transform (DCT) or a conceptually similar transform.
  • Transform processing unit 58 may convert the residual video data from a pixel domain to a transform domain, such as a frequency domain.
  • transform processing unit 58 may apply a 2-dimensional (2-D) transform (in both the horizontal and vertical direction) to the residual data in the TBs.
  • transform processing unit 58 may instead apply a horizontal 1-D transform, a vertical 1-D transform, or no transform to the residual data in each of the TBs.
  • Transform processing unit 58 may send the resulting transform coefficients to quantization processing unit 60 .
  • Quantization processing unit 60 quantizes the transform coefficients to further reduce the bit rate.
  • the quantization process may reduce the bit depth associated with some or all of the coefficients.
  • the degree of quantization may be modified by adjusting a quantization parameter.
  • quantization processing unit 60 may then perform a scan of the matrix including the quantized transform coefficients.
  • entropy encoding unit 62 may perform the scan.
  • the scan performed on a transform block may be based on the size of the transform block.
  • Quantization processing unit 60 and/or entropy encoding unit 62 may scan 8 ⁇ 8, 16 ⁇ 16, and 32 ⁇ 32 transform blocks using any combination of the sub-block scans described above with respect to FIGS. 1A-1C .
  • entropy encoding unit 62 may determine a scan order based on a coding parameter associated with the transform block, such as a prediction mode associated with a prediction unit corresponding to the transform block. Further details with respect to entropy encoding unit 62 are described below with respect to FIG. 5 .
  • Inverse quantization processing unit 64 and inverse transform processing unit 66 apply inverse quantization and inverse transformation, respectively, to reconstruct the residual block in the pixel domain for later use as a reference block of a reference picture.
  • Motion compensation unit 52 may calculate a reference block by adding the residual block to a predictive block of one of the reference pictures within one of the reference picture lists. Motion compensation unit 52 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for use in motion estimation.
  • Summer 68 adds the reconstructed residual block to the motion compensated prediction block produced by motion compensation unit 52 to produce a reference block for storage in reference picture memory 70 .
  • the reference block may be used by motion estimation unit 50 and motion compensation unit 52 as a reference block to inter-predict a block in a subsequent video frame or picture.
  • entropy encoding unit 62 entropy encodes the quantized transform coefficients.
  • entropy encoding unit 62 may perform context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding or another entropy encoding methodology or technique.
  • CAVLC context adaptive variable length coding
  • CABAC context adaptive binary arithmetic coding
  • SBAC syntax-based context-adaptive binary arithmetic coding
  • PIPE probability interval partitioning entropy
  • the encoded bitstream may be transmitted to video decoder 42 , or archived for later transmission or retrieval by video decoder 42 .
  • Entropy encoding unit 62 may also entropy encode the motion vectors and the other syntax elements for the current video slice being coded. Entropy encoding unit 62 may entropy encode syntax elements such as the significance syntax elements and the other syntax elements for the transform coefficients described above using CABAC.
  • entropy encoding unit 62 may be configured to implement the techniques described in this disclosure of determining contexts based on a determined scan order. In some examples, entropy encoding unit 62 in conjunction with one or more units within video encoder 32 may be configured to implement the techniques described in this disclosure. In some examples, a processor or processing unit (not shown) of video encoder 32 may be configured to implement the techniques described in this disclosure.
  • FIG. 5 is a block diagram that illustrates an example entropy encoding unit 62 that may implement the techniques described in this disclosure.
  • the entropy encoding unit 62 illustrated in FIG. 5 may be a CABAC encoder.
  • the example entropy encoding unit 62 may include a binarization unit 72 , an arithmetic encoding unit 80 , which includes a bypass encoding engine 74 and a regular encoding engine 78 , and a context modeling unit 76 .
  • Entropy encoding unit 62 may receive one or more syntax elements, such as the significance syntax element, referred to as a significant coefficient_flag in HEVC, the greater than 1 flag, referred to as a coeff_abs_level_greater1 flag in HEVC, the greater than 2 flag, referred to as coeff_abs_level_greater2 flag in HEVC, the sign flag referred to as coeff_sign_flag in HEVC, and the level syntax element referred to as coeff_abs_level_remain.
  • Binarization unit 72 receives a syntax element and produces a bin string (i.e., binary string).
  • Binarization unit 72 may use, for example, any one or combination of the following techniques to produce a bin string: fixed length coding, unary coding, truncated unary coding, truncated Rice coding, Golomb coding, exponential Golomb coding, and Golomb-Rice coding. Further, in some cases, binarization unit 72 may receive a syntax element as a binary string and simply pass-through the bin values. In one example, binarization unit 72 receives the significance syntax element and produces a bin string.
  • Arithmetic encoding unit 80 is configured to receive a bin string from binarization unit 72 and perform arithmetic encoding on the bin string. As shown in FIG. 5 , arithmetic encoding unit 80 may receive bin values from a bypass path or the regular coding path. Bin values that follow the bypass path may be bins values identified as bypass coded and bin values that follow the regular encoding path may be identified as CABAC-coded. Consistent with the CABAC process described above, in the case where arithmetic encoding unit 80 receives bin values from a bypass path, bypass encoding engine 74 may perform arithmetic encoding on bin values without utilizing an adaptive context assigned to a bin value. In one example, bypass encoding engine 74 may assume equal probabilities for possible values of a bin.
  • context modeling unit 76 may provide a context variable (e.g., a context state), such that regular encoding engine 78 may perform arithmetic encoding based on the context assignments provided by context modeling unit 76 .
  • the context assignments may be defined according to a video coding standard, such as the HEVC standard.
  • context modeling unit 76 and/or entropy encoding unit 62 may be configured to determine contexts for bins of the significance syntax elements based on techniques described herein. The techniques may be incorporated into HEVC or another video coding standard.
  • the context models may be stored in memory.
  • Context modeling unit 76 may include a series of indexed tables and/or utilize mapping functions to determine a context and a context variable for a particular bin. After encoding a bin value, regular encoding engine 78 may update a context based on the actual bin values.
  • FIG. 6 is a flowchart illustrating an example process for encoding video data according to this disclosure. Although the process in FIG. 6 is described below as generally being performed by video encoder 32 , the process may be performed by any combination of video encoder 32 , entropy encoding unit 62 , and/or context modeling unit 76 .
  • video encoder 32 may determine a scan order for transform coefficients of a block ( 82 ). Video encoder 32 may determine contexts for the transform coefficients based on the scan order ( 84 ). In some examples, video encoder 32 determines the contexts based on the determined scan order, positions of the transform coefficients with the block, and a size of the block. For example, for a particular block size (e.g., an 8 ⁇ 8 block of transform coefficients) and a particular position (e.g., transform coefficient position), video encoder 32 may determine the same context if the scan order is either horizontal scan or vertical scan, and determine a different context if the scan order in not the horizontal scan or the vertical scan.
  • a particular block size e.g., an 8 ⁇ 8 block of transform coefficients
  • a particular position e.g., transform coefficient position
  • Video encoder 32 may CABAC encode significance syntax elements (e.g., significance flags) for the transform coefficients based on the determined contexts ( 86 ). Video encoder 32 may signal the encoded significance syntax elements (e.g., significance flags) ( 88 ).
  • significance syntax elements e.g., significance flags
  • FIG. 7 is a block diagram illustrating an example video decoder 42 that may implement the techniques described in this disclosure.
  • video decoder 42 includes an entropy decoding unit 90 , prediction processing unit 92 , inverse quantization processing unit 98 , inverse transform processing unit 100 , summer 102 , and reference picture memory 104 .
  • Prediction processing unit 92 includes motion compensation unit 94 and intra prediction unit 96 .
  • Video decoder 42 may, in some examples, perform a decoding pass generally reciprocal to the encoding pass described with respect to video encoder 32 from FIG. 4 .
  • video decoder 42 receives an encoded video bitstream that represents video blocks of an encoded video slice and associated syntax elements from video encoder 32 .
  • Entropy decoding unit 90 of video decoder 42 entropy decodes the bitstream to generate quantized coefficients, motion vectors, and other syntax elements.
  • Entropy decoding unit 90 forwards the motion vectors and other syntax elements to prediction processing unit 92 .
  • Video decoder 42 may receive the syntax elements at the video slice level and/or the video block level.
  • entropy decoding unit 90 may be configured to implement the techniques described in this disclosure of determining contexts based on a determined scan order. In some examples, entropy decoding unit 90 in conjunction with one or more units within video decoder 42 may be configured to implement the techniques described in this disclosure. In some examples, a processor or processing unit (not shown) of video decoder 42 may be configured to implement the techniques described in this disclosure.
  • FIG. 8 is a block diagram that illustrates an example entropy decoding unit 90 that may implement the techniques described in this disclosure.
  • Entropy decoding unit 90 receives an entropy encoded bitstream and decodes syntax elements from the bitstream. Syntax elements may include the syntax elements such as significant_coefficient_flag, coeff_abs_level_remain, coeff_abs_level_greater1 flag, coeff_abs_level_greater2 flag, and coeff_sign_flag, syntax elements described above for transform coefficients of a block.
  • the example entropy decoding unit 90 in FIG. 8 includes an arithmetic decoding unit 106 , which may include a bypass decoding engine 108 and a regular decoding engine 110 .
  • the example entropy decoding unit 90 also includes context modeling unit 112 and inverse binarization unit 114 .
  • the example entropy decoding unit 90 may perform the reciprocal functions of the example entropy encoding unit 62 described with respect to FIG. 5 . In this manner, entropy decoding unit 90 may perform entropy decoding based on the techniques described in this disclosure.
  • Arithmetic decoding unit 106 receives an encoded bit stream. As shown in FIG. 8 , arithmetic decoding unit 106 may process encoded bin values according to a bypass path or the regular coding path. An indication whether an encoded bin value should be processed according to a bypass path or a regular pass may be signaled in the bitstream with higher level syntax. Consistent with the CABAC process described above, in the case where arithmetic decoding unit 106 receives bin values from a bypass path, bypass decoding engine 108 may perform arithmetic encoding on bin values without utilizing a context assigned to a bin value. In one example, bypass decoding engine 108 may assume equal probabilities for possible values of a bin.
  • context modeling unit 112 may provide a context variable, such that regular decoding engine 110 may perform arithmetic encoding based on the context assignments provided by context modeling unit 112 .
  • the context assignments may be defined according to a video coding standard, such as HEVC.
  • the context models may be stored in memory.
  • Context modeling unit 112 may include a series of indexed tables and/or utilize mapping functions to determine a context and a context variable portion of an encoded bitstream. Further, in one example context modeling unit 112 and/or entropy decoding unit 90 may be configured to assign contexts to bins of the significance syntax elements based on techniques described herein.
  • regular decoding engine 110 may update a context based on the decoded bin values. Further, inverse binarization unit 114 may perform an inverse binarization on a bin value and use a bin matching function to determine if a bin value is valid. The inverse binarization unit 114 may also update the context modeling unit based on the matching determination. Thus, the inverse binarization unit 114 outputs syntax elements according to a context adaptive decoding technique.
  • intra prediction unit 96 of prediction processing unit 92 may generate prediction data for a video block of the current video slice based on a signaled intra prediction mode and data from previously decoded blocks of the current frame or picture.
  • motion compensation unit 94 of prediction processing unit 92 produces predictive blocks for a video block of the current video slice based on the motion vectors and other syntax elements received from entropy decoding unit 90 .
  • the predictive blocks may be produced from one of the reference pictures within one of the reference picture lists.
  • Video decoder 42 may construct the reference picture lists, List 0 and List 1, using default construction techniques based on reference pictures stored in reference picture memory 104 .
  • Motion compensation unit 94 determines prediction information for a video block of the current video slice by parsing the motion vectors and other syntax elements, and uses the prediction information to produce the predictive blocks for the current video block being decoded. For example, motion compensation unit 94 uses some of the received syntax elements to determine a prediction mode (e.g., intra- or inter-prediction) used to code the video blocks of the video slice, an inter-prediction slice type (e.g., B slice or P slice), construction information for one or more of the reference picture lists for the slice, motion vectors for each inter-encoded video block of the slice, inter-prediction status for each inter-coded video block of the slice, and other information to decode the video blocks in the current video slice.
  • a prediction mode e.g., intra- or inter-prediction
  • an inter-prediction slice type e.g., B slice or P slice
  • construction information for one or more of the reference picture lists for the slice motion vectors for each inter-encoded video block of the slice, inter-pre
  • Motion compensation unit 94 may also perform interpolation based on interpolation filters. Motion compensation unit 94 may use interpolation filters as used by video encoder 32 during encoding of the video blocks to calculate interpolated values for sub-integer pixels of reference blocks. In this case, motion compensation unit 94 may determine the interpolation filters used by video encoder 32 from the received syntax elements and use the interpolation filters to produce predictive blocks.
  • Inverse quantization processing unit 98 inverse quantizes, i.e., de-quantizes, the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 90 .
  • the inverse quantization process may include use of a quantization parameter calculated by video encoder 32 for each video block in the video slice to determine a degree of quantization and, likewise, a degree of inverse quantization that should be applied.
  • Inverse transform processing unit 100 applies an inverse transform, e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process, to the transform coefficients in order to produce residual blocks in the pixel domain.
  • inverse transform processing unit 100 may apply a 2-dimensional (2-D) inverse transform (in both the horizontal and vertical direction) to the coefficients.
  • inverse transform processing unit 88 may instead apply a horizontal 1-D inverse transform, a vertical 1-D inverse transform, or no transform to the residual data in each of the TUs.
  • the type of transform applied to the residual data at video encoder 32 may be signaled to video decoder 42 to apply an appropriate type of inverse transform to the transform coefficients.
  • video decoder 42 forms a decoded video block by summing the residual blocks from inverse transform processing unit 100 with the corresponding predictive blocks generated by motion compensation unit 94 .
  • Summer 102 represents the component or components that perform this summation operation.
  • a deblocking filter may also be applied to filter the decoded blocks in order to remove blockiness artifacts.
  • Other loop filters may also be used to smooth pixel transitions, or otherwise improve the video quality.
  • the decoded video blocks in a given frame or picture are then stored in reference picture memory 104 , which stores reference pictures used for subsequent motion compensation.
  • Reference picture memory 104 also stores decoded video for later presentation on a display device, such as display device 44 of FIG. 3 .
  • FIG. 9 is a flowchart illustrating an example process for decoding video data according to this disclosure. Although the process in FIG. 9 is described below as generally being performed by video decoder 42 , the process may be performed by any combination of video decoder 42 , entropy decoding unit 90 , and/or context modeling unit 112 .
  • video decoder 42 receives, from a coded bitstream, significance syntax elements (e.g., significance flags) for transform coefficients of a block ( 116 ).
  • Video decoder 42 determines a scan order for the transform coefficients ( 118 ).
  • Video decoder 42 determines contexts for the transform coefficients based on the determined scan order ( 120 ).
  • video decoder 42 also determines the block size and determines the contexts based on the determined scan order and block size.
  • video decoder 42 determines the contexts based on the determined scan order, positions of the transform coefficients with the block, and a size of the block.
  • video decoder 42 may determine the same context if the scan order is either horizontal scan or vertical scan, and determine a different context if the scan order in not the horizontal scan or the vertical scan.
  • Video decoder 42 CABAC decodes the significance syntax elements (e.g., significance flags) based on the determined contexts ( 122 ).
  • Video encoder 32 as described in the flowchart of FIG. 6
  • video decoder 42 as described in the flowchart of FIG. 9
  • video encoder 32 and video decoder 42 may be configured to determine the contexts that are the same if the determined scan order is a horizontal scan and if the determined scan order is a vertical scan, and determine the contexts, which are different than the contexts if the determined scan order is the horizontal scan and if the determined scan order is the vertical scan, if the determined scan order is not the horizontal scan or the vertical scan (e.g., diagonal scan).
  • video encoder 32 and video decoder 42 may be configured to determine a first set of contexts for the significance syntax elements if the scan order is a first scan order, and determine a second set of contexts for the significance syntax elements if the scan order is a second scan order.
  • the first set of contexts is the same as the second set of contexts if the first scan order is a horizontal scan and the second scan order is a vertical scan.
  • the first set of contexts is different than the second set of contexts if the first scan order is one of a horizontal scan or a vertical scan and the second scan order is not the horizontal scan or the vertical scan.
  • video encoder 32 and video decoder 42 may determine whether the size of the block is a first size or a second size.
  • One example of the first size is the 4 ⁇ 4 block, and one example of the second size is the 8 ⁇ 8 block. If the size of the block is the first size (e.g., the 4 ⁇ 4 block), video encoder 32 and video decoder 42 may determine the contexts that are the same for all scan orders (e.g., the contexts that are the same for the diagonal, horizontal, and vertical scans for the 4 ⁇ 4 block).
  • video encoder 32 and video decoder 42 may determine the contexts that are different for at least two different scan orders (e.g., the contexts for the diagonal scan of the 8 ⁇ 8 block is different than the contexts for the horizontal or vertical scan of the 8 ⁇ 8 block, but the contexts for the horizontal and vertical scan of the 8 ⁇ 8 block may be the same).
  • transform coefficients resulting from intra-coding such as transform coefficients resulting from intra-coding
  • the techniques may be applicable to other examples as well, such as for inter-coding.
  • the following techniques can be used individually or in conjunction with any of the other techniques described in this disclosure.
  • the techniques described above may be used in conjunction with any of the following techniques, or may be implemented separately from any of the following techniques.
  • video encoder 32 and video decoder 42 may utilize one scan order to determine the location of last significant coefficient. Video encoder 32 and video decoder 42 may utilize a different scan order to determine neighborhood contexts for the transform coefficients. Video encoder 32 and video decoder 42 may then code significance flags, level information, and sign information based on the determined neighborhood contexts. For example, video encoder 32 and video decoder 42 may utilize a horizontal or vertical scan (referred to as the nominal scan) to identify the last significant transform coefficient, and then utilize a diagonal scan on the 4 ⁇ 4 blocks or 4 ⁇ 4 sub-blocks (if 8 ⁇ 8 block) to determine the neighborhood contexts.
  • the nominal scan referred to as the nominal scan
  • the position of the last significant coefficient in the scan order is coded in the bit-stream. This is followed by the significance map for a subset of 16 coefficients (a 4 ⁇ 4 sub-block in case of a 4 ⁇ 4 sub-block based diagonal scan) in backwards scan order, followed by coding passes for level information and sign. It should be noted that the position of the last significant coefficient depends directly on the specific scan that is used. An example of this is shown in FIG. 10 .
  • the last significant coefficient position is still determined and coded based on the nominal scan. But then, for coding significance, level and sign information, the block is scanned using a 4 ⁇ 4 sub-block based diagonal scan starting with the bottom-right coefficient and proceeding backwards to the DC coefficient. If it can be derived from the position of the last significant coefficient that a particular coefficient is not significant, no significance, level or sign information is coded for that coefficient.
  • FIG. 11 is a conceptual diagram illustrating use of a diagonal scan in place of an original horizontal scan.
  • FIG. 11 illustrates block 130 .
  • the coefficients with solid fill are significant.
  • the position of the last significant position, assuming a horizontal scan, is (1, 1) (transform coefficient 132 ). All coefficients with row indices greater than 1 can be inferred to be not significant. Similarly, all coefficients with row index 1 and column index greater than 1 can be inferred to be not significant. Similarly, the coefficient (1, 1) can be inferred to be significant. Its level and sign information cannot be inferred. For coding of significance, level and sign information, a backward 4 ⁇ 4 sub-block based diagonal scan is used.
  • the significance flags are encoded.
  • the significance flags that can be inferred are not explicitly coded.
  • a neighborhood based context is used for coding of significance flags.
  • the neighborhood may be the same as that used for 16 ⁇ 16 and 32 ⁇ 32 blocks or a different neighborhood may be used. It should be noted that, similar to above, separate sets of neighborhood-based contexts may be used for the different scans (horizontal, vertical, and 4 ⁇ 4 sub-block). Also, the contexts may be shared between different block sizes.
  • any of a various techniques such as those of JCTVC-H0228, may be used for coding significance, level and sign information for 4 ⁇ 4 and 8 ⁇ 8 blocks after the position of the last significant position is coded assuming the nominal scan.
  • a 4 ⁇ 4 sub-block based diagonal scan may be used for coding significance, level and sign information.
  • the method is not restricted to horizontal, vertical and 4 ⁇ 4 sub-block based diagonal scans.
  • the basic principle is to send the last significant coefficient position assuming the nominal scan and then code the significance (and possibly level and sign) information using another scan which uses neighborhood based contexts.
  • the techniques have been described for 4 ⁇ 4 and 8 ⁇ 8 blocks, it can be extended to any block size where horizontal and/or vertical scans may be used.
  • the video coder may determine which context to use for coding a transform coefficient based on row index or the column index of the transform coefficient. For example, for a horizontal scan, all transform coefficients in the same row may share the same context, and the video coder may utilize different contexts for transform coefficients in the different rows. For a vertical scan, all transform coefficients in the same column may share the same context, and the video coder may utilize different contexts for transform coefficients in the different columns.
  • JCTVC-H0228 uses the sum of row and column indices to determine the context set. In the case of JCTVC-H0228, this is done even for horizontal and vertical scans.
  • the context set used to code the significance or level for a particular coefficient for horizontal scan may depend only on the row index of the coefficient.
  • the context set to code the significance or level for a coefficient in case of vertical scan may depend only on the column index of the coefficient.
  • the context set may depend only on the absolute index of the coefficient in the scan. Different scans may use different functions to derive the context set.
  • horizontal, vertical and 4 ⁇ 4 sub-block-based diagonal scans may use separate context sets or the horizontal and vertical scans may share context sets.
  • the context set not only the context set but also the context itself depends only on the absolute index of the coefficient in the scanning order.
  • the video coder (e.g., video encoder 32 or video decoder 42 ) may be configured to implement only one type of scan (e.g., a diagonal scan).
  • the neighboring regions that the video coder evaluates may be based on the nominal scan.
  • the nominal scan is the scan the video coder would have performed had the video coder been able to perform other scans.
  • video encoder 32 may signal that the horizontal scan is to be used.
  • video decoder 42 may implement the diagonal scan instead, but the neighboring regions that the video coder evaluates may be based on the signaling that the horizontal scan is to be used. The same would apply for the vertical scan.
  • the video coder may stretch the neighboring region that is evaluated in the horizontal direction relative to the regions that are currently used. The same would apply when the nominal scan is the vertical scan, but in the vertical direction.
  • the stretching of the neighboring region may be referred to as varying the region. For example, if the nominal scan is horizontal, then rather than evaluating a transform coefficient that is two rows down from where the current transform coefficient being coded is located, the video coder may evaluate the transform coefficient that is three columns apart from where the current transform coefficient is located. The same would apply when the nominal scan is the vertical scan, but the transform coefficient would be located three rows apart from where the current transform coefficient (e.g., the one being coded) is located
  • FIG. 12 is a conceptual diagram illustrating a context neighborhood for a nominal horizontal scan.
  • FIG. 12 illustrates 8 ⁇ 8 block 134 that includes 4 ⁇ 4 sub-blocks 136 A- 136 D.
  • the coefficient two rows down has been replaced by the coefficient that is in the same row but three columns apart (X 4 ).
  • X 4 the coefficient that is in the same row but three columns apart
  • a context neighborhood that is stretched in the vertical direction may be used.
  • Computer-readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol.
  • Computer-readable media generally may correspond to (1) tangible computer-readable storage media which is non-transitory or (2) a communication medium such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure.
  • a computer program product may include a computer-readable medium.
  • Disk and disc includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc, where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
  • processors such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable logic arrays
  • processors may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein.
  • the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.
  • the techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set).
  • IC integrated circuit
  • a set of ICs e.g., a chip set.
  • Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, various units may be combined in a codec hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)

Abstract

Techniques are described for determining a scan order for transform coefficients of a block. The techniques may determine context for encoding or decoding significance syntax elements for the transform coefficients based on the determined scan order. A video encoder may encode the significance syntax elements and a video decoder may decode the significance syntax elements based on the determined contexts.

Description

    RELATED APPLICATIONS
  • This application claims the benefit of:
    • U.S. Provisional Application No. 61/625,039, filed Apr. 16, 2012, and
    • U.S. Provisional Application No. 61/667,382, filed Jul. 2, 2012, the entire content each of which is incorporated by reference herein.
    TECHNICAL FIELD
  • This disclosure relates to video coding and more particularly to techniques for coding syntax elements associated with transform coefficients, used in video coding.
  • BACKGROUND
  • Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones, so-called “smart phones,” video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video compression techniques defined according to video coding standards. Digital video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video compression techniques. Video coding standards include ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC), including its Scalable Video Coding (SVC) and Multiview Video Coding (MVC) extensions. In addition, High-Efficiency Video Coding (HEVC) is a video coding standard being developed by the Joint Collaboration Team on Video Coding (JCT-VC) of ITU-T Video Coding Experts Group (VCEG) and ISO/IEC Motion Picture Experts Group (MPEG).
  • Video compression techniques perform spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (i.e., a video frame or a portion of a video frame) may be partitioned into video blocks, which may also be referred to as treeblocks, coding units (CUs) and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. Pictures may be referred to as frames, and reference pictures may be referred to a reference frames.
  • Spatial or temporal prediction results in a predictive block for a block to be coded. Residual data represents pixel differences between the original block to be coded and the predictive block. An inter-coded block is encoded according to a motion vector that points to a block of reference samples forming the predictive block, and the residual data indicating the difference between the coded block and the predictive block. An intra-coded block is encoded according to an intra-coding mode and the residual data. For further compression, the residual data may be transformed from the pixel domain to a transform domain, resulting in residual transform coefficients, which then may be quantized. The quantized transform coefficients, initially arranged in a two-dimensional array, may be scanned in order to produce a one-dimensional vector of transform coefficients, and entropy coding may be applied to achieve even more compression.
  • SUMMARY
  • In general, this disclosure describes techniques for encoding and decoding data representing syntax elements (e.g., significance flags) associated with transform coefficients of a block. In some techniques, a video encoder and a video decoder each determines contexts to be used for context adaptive binary arithmetic coding (CABAC). As described in more detail, the video encoder and the video decoder determine a scan order for the block, and determine the contexts based on the scan order. In some examples, the video decoder determines contexts that are the same for two or more scan orders, and different contexts for other scan orders. Similarly, in these examples, the video encoder determines contexts that are the same for the two or more scan orders, and different contexts for the other scan orders.
  • In one example, the disclosure describes a method for decoding video data. The method comprising receiving, from a coded bitstream, significance flags of transform coefficients of a block, determining a scan order for the transform coefficients of the block, determining contexts for the significance flags of the transform coefficients of the block based on the determined scan order, and context adaptive binary arithmetic coding (CABAC) decoding the significance flags of the transform coefficients based at least on the determined contexts.
  • In another example, the disclosure describes a method for encoding video data. The method comprising determining a scan order for transform coefficients of a block, determining contexts for significance flags of the transform coefficients of the block based on the determined scan order, context adaptive binary arithmetic coding (CABAC) encoding the significance flags of the transform coefficients based at least on the determined contexts, and signaling the encoded significance flags in a coded bitstream.
  • In another example, the disclosure describes an apparatus for coding video data. The apparatus comprises a video coder configured to determine a scan order for transform coefficients of a block, determine contexts for significance flags of the transform coefficients of the block based on the determined scan order, and context adaptive binary arithmetic coding (CABAC) code the significance flags of the transform coefficients based at least on the determined contexts.
  • In another example, the disclosure describes an apparatus for coding video data. The apparatus comprises means for determining a scan order for transform coefficients of a block, means for determining contexts for significance flags of the transform coefficients of the block based on the determined scan order, and means for context adaptive binary arithmetic coding (CABAC) the significance flags of the transform coefficients based at least on the determined contexts.
  • In another example, the disclosure describes a computer-readable storage medium. The computer-readable storage medium having instructions stored thereon that when executed cause one or more processors of an apparatus for coding video data to determine a scan order for transform coefficients of a block, determine contexts for significance flags of the transform coefficients of the block based on the determined scan order, and context adaptive binary arithmetic coding (CABAC) code the significance flags of the transform coefficients based at least on the determined contexts.
  • The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
  • BRIEF DESCRIPTION OF DRAWINGS
  • FIGS. 1A-1C are conceptual diagrams illustrating examples of scan orders of a block that includes transform coefficients.
  • FIG. 2 is a conceptual diagram illustrating a mapping of transform coefficients to significance syntax elements.
  • FIG. 3 is a block diagram illustrating an example video encoding and decoding system that may utilize techniques described in this disclosure.
  • FIG. 4 is a block diagram illustrating an example video encoder that may implement techniques described in this disclosure.
  • FIG. 5 is a block diagram illustrating an example of an entropy encoder that may implement techniques for entropy encoding syntax elements in accordance with this disclosure.
  • FIG. 6 is a flowchart illustrating an example process for encoding video data according to this disclosure.
  • FIG. 7 is a block diagram illustrating an example video decoder that may implement techniques described in this disclosure.
  • FIG. 8 is a block diagram illustrating an example of an entropy decoder that may implement techniques for decoding syntax elements in accordance with this disclosure.
  • FIG. 9 is a flowchart illustrating an example process of decoding video data according to this disclosure.
  • FIG. 10 is a conceptual diagram illustrating positions of a last significant coefficient depending on the scan order.
  • FIG. 11 is a conceptual diagram illustrating use of a diagonal scan in place of an original horizontal scan.
  • FIG. 12 is a conceptual diagram illustrating a context neighborhood for a nominal horizontal scan.
  • DETAILED DESCRIPTION
  • A video encoder determines transform coefficients for a block, encodes syntax elements, that indicate the values of the transform coefficients, using context adaptive binary arithmetic coding (CABAC), and signals the encoded syntax elements in a bitstream. A video decoder receives the bitstream that includes the encoded syntax elements that indicate the values of the transform coefficients and CABAC decodes the syntax elements to determine the transform coefficients for the block.
  • The video encoder and video decoder determine which contexts are to be used to perform CABAC encoding and CABAC decoding, respectively. In the techniques described in this disclosure, the video encoder and the video decoder may determine which contexts to use to perform CABAC encoding or CABAC decoding based on a scan order of the block of the transform coefficients. In some examples, the video encoder and the video decoder may determine which contexts to use to perform CABAC encoding or CABAC decoding based on a size of the block, positions of the transform coefficients within the block, and the scan order.
  • In some examples, the video encoder and the video decoder may utilize different contexts for different scan orders (i.e., a first set of contexts for horizontal scan, a second set of contexts for vertical scan, and a third set of contexts for diagonal scan). As another example, if the block of transform coefficients is scanned vertically or horizontally, the video encoder and the video decoder may utilize the same contexts for both of these scan orders (e.g., for a particular position of a transform coefficient).
  • By determining which contexts to use for CABAC encoding or CABAC decoding, the techniques described in this disclosure may exploit the statistical behavior of the magnitudes of the transform coefficients in a way that achieves better video compression, as compared to other techniques. For instance, it may be possible for the video encoder and the video decoder to determine which contexts to use for CABAC encoding or CABAC decoding based on the position of the transform coefficient, irrespective of the scan order. However, the scan order may have an effect on the ordering of the transform coefficients.
  • For example, the block of transform coefficients may be a two-dimensional (2D) block of coefficients that the video encoder scans to construct a one-dimensional (1D) vector, and the video encoder entropy encodes (using CABAC) the values of the transform coefficients in the 1D vector. The order in which the video encoder places the values (e.g., magnitudes) of the transform coefficients in the 1D vector is a function of the scan order. The order in which the video encoder places the magnitudes of the transform coefficients for a diagonal scan may be different than the order in which the video encoder places the magnitudes of the transform coefficients for a vertical scan.
  • In other words, the position of the magnitudes of the transform coefficients may be different for different scan orders. The position of the magnitudes of the transform coefficients may have an effect on coding efficiency. For instance, the location of the last significant coefficient, in the block, may be different for different scan orders. In this case, the magnitude of the last significant coefficient may be different for different scan orders.
  • Accordingly, these other techniques that determine contexts based on the position of the transform coefficient irrespective to the scan order fail to properly account for the potential that the significance statistics for a transform coefficient in a particular position may vary depending on the scan order. In the techniques described in this disclosure, the video encoder and video decoder may determine the scan order for the block, and determine contexts based on the determined scan order (and in some examples, also based on the positions of the transform coefficients and possibly the size of the block). This way, the video encoder and video decoder may better account for the significance statistics for determining which contexts to use as compared to techniques that do not rely on the scan order and rely only on the position for determining which contexts to use.
  • In some examples of video coding, the video encoder and the video decoder may use five coding passes to encode or decode transform coefficients of a block, namely, (1) a significance pass, (2) a greater than one pass, (3) a greater than two pass, (4) a sign pass, and (5) a coefficient level remaining pass. The techniques of this disclosure, however, are not necessarily limited to five pass scenarios. In general, significance coding refers to generating syntax elements to indicate whether any of the coefficients within the block have an absolute value of one or greater. That is, a coefficient with an absolute value of one or greater is considered “significant.” The other coding passes are described in more detail below.
  • During the significance pass, the video encoder determines syntax elements that indicate whether a transform coefficient is significant. Syntax elements that indicate whether a transform coefficient is significant are referred to herein as significance syntax elements. One example of a significance syntax element is a significance flag, where a value of 0 for the significance flag indicates that the coefficient is not significant (i.e., the value of the transform coefficient is 0) and a value of 1 for the significance flag indicates that the coefficient is significant (i.e., the value of the transform coefficient is non-zero).
  • To perform the significance pass, the video encoder scans the transform coefficients of a block or part of the block (if the position of the last significant position is previously determined and signaled to the decoder), and determines the significance syntax element for each transform coefficient. There are various examples of the scan order, such as a horizontal scan, a vertical scan, and a diagonal scan. The video encoder CABAC encodes the significance syntax elements and signals the encoded significance syntax elements in a coded bitstream. Other types of scans, such as zig-zag scans, adaptive or partially adaptive scans may also be used in some examples.
  • To apply CABAC coding to a syntax element, binarization may be applied to a syntax element to form a series of one or more bits, which are referred to as “bins.” In addition, a coding context may be associated with a bin of the syntax element. The coding context may identify probabilities of coding bins having particular values. For instance, a coding context may indicate a 0.7 probability of coding a O-valued bin (representing an example of a “most probable symbol,” in this instance) and a 0.3 probability of coding a 1-valued bin. After identifying the coding context, a bin may be arithmetically coded based on the context. In some cases, contexts associated with a particular syntax element or bins thereof may be dependent on other syntax elements or coding parameters.
  • In the techniques described in this disclosure, the video encoder may determine which contexts to use for the CABAC encoding based on the scan order. The video encoder may use one set of contexts per scan order type. For example, if the block is a 4×4 block, there are sixteen coefficients. In this example, the video encoder may utilize sixteen contexts for each scan resulting in a total of forty-eight contexts (i.e., sixteen contexts for horizontal scan, sixteen contexts for vertical scan, and sixteen contexts for diagonal scan for a total of forty-eight contexts). The same would hold for an 8×8 block, but with a total of 192 contexts (i.e., sixty-four contexts for horizontal scan, sixty-four contexts for vertical scan, and sixty-four contexts for diagonal scan for a total of 192 contexts). However, the example of forty-eight or 192 contexts is provided for purposes of illustration only. It may be possible that the number of contexts for each block is a function of block size.
  • The video decoder receives the coded bitstream (e.g., from the video encoder directly or via a storage medium that stores the coded bitstream) and performs a reciprocal function, as that of the video encoder, to determine the values of the transform coefficients. For example, the video decoder implements the significance pass to determine which transform coefficients are significant based on the significance syntax elements in the received bitstream.
  • In the techniques described in this disclosure, the video decoder may determine the scan order of the transform coefficients of the block (e.g., the scan order in which the transform coefficients were scanned). The video decoder may determine which contexts to use for CABAC decoding the significance syntax elements based on the scan order (e.g., sixteen of the forty-eight contexts for a 4×4 block or sixty-four of the 192 contexts for an 8×8 block). In this manner, the video decoder may select the same contexts for CABAC decoding that video encoder selected for CABAC encoding. The video decoder CABAC decodes the significance syntax elements based on the determined contexts.
  • In the above examples, the video encoder and the video decoder determined contexts based on the scan order, where the contexts were different for different scan orders resulting in a total of forty-eight contexts for a 4×4 block and 192 contexts for an 8×8 block. However, the techniques described in this disclosure are not limited in this respect. Alternatively, in some examples, the contexts that the video encoder and the video decoder use may be the same contexts for multiple (i.e., two or more) scan orders to allow for context sharing depending on scan order type.
  • As one example, the video encoder and the video decoder may determine contexts that are the same if the scan order is a horizontal scan or if the scan order is a vertical scan. In other words, the contexts are the same if the scan order is the horizontal scan or if the scan order is the vertical scan for a particular position of the transform coefficient within the block. The video encoder and the video decoder may utilize different contexts for the diagonal scan. In this example, the number of contexts for the 4×4 block reduces from forty-eight contexts to thirty-two contexts and for the 8×8 block reduces from 192 contexts to 128 because the contexts for the horizontal scan and the vertical scan are the same, and there are different contexts for the diagonal scan.
  • As another example, it may be possible for the video encoder and the video decoder to use the same contexts for all scan order types, which reduces the contexts to sixteen for the 4×4 block and sixty-four for the 8×8 block. However, using the same contexts for all scan order types may be a function of the block size. For example, for certain block sizes, it may be possible to use the same contexts for all scan orders, and for certain other blocks sizes, the contexts may be different for the different scan orders, or two or more of the scan orders may share contexts.
  • For instance, for an 8×8 block, the contexts for the horizontal and vertical scans may be the same (e.g., for a particular position), and different for the diagonal scan. For the 4×4, 16×16, and 32×32 blocks, the contexts may be different for different scan orders. Moreover, in some other techniques that relied on position, the contexts for the 2D block and the 1D block may be different. In the techniques described in this disclosure, when contexts are shared for all scan orders, the contexts for the 2D block or the 1D block may be the same.
  • In some examples, in addition to utilizing the scan order to determine the contexts, the video encoder and the video decoder may account for the size of the block. For instance, in the above example, the size of the block indicated whether all scan orders share contexts. In some examples, the video encoder and the video decoder may determine which contexts to use based on the size of the block and the scan order. In these examples, the techniques described in this disclosure may allow for context sharing. For instance, for a block with a first size, the video encoder and the video decoder may determine contexts that are the same if the block of the first size is scanned horizontally or if the block of the first size is scanned vertically. For a block with a second size, the video encoder and the video decoder may determine contexts that are the same if the block of the second size is scanned horizontally or if the block of the second size is scanned vertically.
  • There may be other variations to these techniques. For example, for certain sized blocks (e.g., 16×16 or 32×32), the video encoder and the video decoder determine a first set of contexts that are used for CABAC encoding or CABAC decoding for all scan orders. For certain sized blocks (e.g., 8×8), the video encoder and the video decoder determines a second set of contexts that are used for CABAC encoding or CABAC decoding for a diagonal scan, and a third set of contexts that are used for CABAC encoding or CABAC decoding for both a horizontal scan and a vertical scan. For certain sized blocks (e.g., 4×4), the video encoder and the video decoder determine a fourth set of contexts that are used for CABAC encoding or CABAC decoding for a diagonal scan, a horizontal scan and a vertical scan.
  • In some cases, the examples of determining contexts based on the scan order may be directed to intra-coding modes. For example, the transform coefficients may be the result from intra-coding, and the techniques described in this disclosure may be applicable to such transform coefficients. However, the techniques described in this disclosure are not so limited and may be applicable for inter-coding or intra-coding.
  • FIGS. 1A-1C are conceptual diagrams illustrating examples of scan orders of a block that includes transform coefficients. A block that includes transform coefficients may be referred to as a transform block (TB). A transform block may be a block of a transform unit. For example, a transform unit includes three transform blocks and the corresponding syntax elements. A transform unit may be a transform block of luma samples of size 8×8, 16×16, or 32×32 or four transform blocks of luma samples of size 4×4, two corresponding transform blocks of chroma samples of a picture that three sample arrays, or a transform block of luma samples of size 8×8, 16×16, or 32×32, or four transform blocks of luma samples of size 4×4 or a monochrome picture or a picture that is coded using separate color planes and syntax structures used to transform the transform block samples.
  • FIG. 1A illustrates a horizontal scan of 4×4 block 10 (e.g., TB 10) that includes transform coefficients 12A to 12P (collectively referred to as “transform coefficients 12”). For example, the horizontal scan starts from transform coefficient 12P and ends at transform coefficient 12A, and proceeds horizontally through the transform coefficients.
  • FIG. 1B illustrates a vertical scan of 4×4 block 14 (e.g., TB 14) that includes transform coefficients 16A to 16P (collectively referred to as “transform coefficients 16”). For example, the vertical scan starts from transform coefficient 16P and ends at transform coefficient 16A, and proceeds vertically through the transform coefficients.
  • FIG. 1C illustrates a diagonal scan of 4×4 block 18 (e.g., TB 18) that includes transform coefficients 20A to 20P (collectively referred to as “transform coefficients 20”). For example, the diagonal scan starts from transform coefficient 20P and ends at transform coefficient 20A, and proceeds diagonally through the transform coefficients.
  • It should be understood that although FIGS. 1A-1C illustrate starting from the last transform coefficient and ending on the first transform coefficient, the techniques of this disclosure are not so limited. In some examples, the video encoder may determine the location of the last significant coefficient (e.g., the last transform coefficient with a non-zero value) in the block. The video encoder may scan starting from the last significant coefficient and ending on the first transform coefficient. The video encoder may signal the location of the last significant coefficient in the coded bitstream (i.e., x and y coordinate of the last significant coefficient), and the video decoder may receive the location of the last significant coefficient from the coded bitstream. In this manner, the video decoder may determine that subsequent syntax elements for the transform coefficients (e.g., the significance syntax elements) are for transform coefficients starting from the last significant coefficient and ending on the first transform coefficient.
  • Although FIGS. 1A-1C are illustrated as 4×4 blocks, the techniques described in this disclosure are not so limited, and the techniques can be extended to other sized blocks. Moreover, in some cases, one or more of 4×4 blocks 10, 14, and 18 may be sub-blocks of a larger block. For example, an 8×8 block can be divided into four 4×4 sub-blocks, a 16×16 can be divided into sixteen 4×4 sub-blocks, and so forth, and one or more of 4×4 blocks 10, 14, and 18 may be sub-blocks of the 8×8 block or 16×16 block. Examples of sub-block horizontal and vertical scans are described in: (1) Rosewarne, C., Maeda, M. “Non-CE11: Harmonisation of 8×8 TU residual scan” JCT-VC Contribution JCTVC-H0145; (2) Yu, Y., Panusopone, K., Lou, J., Wang, L. “Adaptive Scan for Large Blocks for HEVC; JCT-VC Contribution JCTVC-F569; and (3) U.S. patent application Ser. No. 13/551,458, filed Jul. 17, 2012, each of which is hereby incorporated by reference.
  • Transform coefficients 12, 16, and 20 represent transformed residual values between a block that is being predicted and another block. The video encoder generates significance syntax elements that indicate whether the values of transform coefficients 12, 16, and 20 are zero or non-zero, encodes the significance syntax elements, and signals the encoded significance syntax elements in a coded bitstream. The video decoder receives the coded bitstream and decodes the significance syntax elements as part of the process of determining transform coefficients 12, 16, and 20.
  • For encoding and decoding, the video encoder and the video decoder determine contexts that are to be used for context adaptive binary arithmetic coding (CABAC) encoding and decoding. In the techniques described in this disclosure, to determine the contexts for the significance syntax elements for transform coefficients 12, 16, and 20, the video encoder and the video decoder account for the scan order.
  • For example, if the video encoder and the video decoder determine that the scan order is a horizontal scan, then the video encoder and the video decoder may determine a first set of contexts for the sixteen transform coefficients 12 of TU 10. If the video encoder and the video decoder determine that the scan order in a vertical scan, then the video encoder and the video decoder may determine a second set of contexts for the sixteen transform coefficients 16 of TU 14. If the video encoder and the video decoder determine that the scan order is a diagonal scan, then the video encoder and the video decoder may determine a third set of contexts for the sixteen transform coefficients 20 of TU 18.
  • In this example, assuming no context sharing, there are a total of forty-eight contexts for the 4×4 blocks 10, 14, and 18 (i.e., sixteen contexts for each of the three scan orders). If blocks 10, 14, and 18 were 8×8 sized blocks, assuming no context sharing, then there would sixty-four contexts for each of the three 8×8 sized blocks, for a total of 192 contexts (i.e., sixty-four contexts for each of the three scan orders).
  • As described in more detail, in some examples, it may be possible for two or more scan orders to share contexts. For example, two or more of the first set of contexts, second set of contexts, and the third set of contexts may be the same set of contexts. For instance, the first set of contexts for the horizontal scan may be the same as the second set of contexts for the vertical scan. In some cases, the first, second, and third contexts may be the same set of contexts.
  • In the above examples, the video encoder and the video decoder determine from a first, second, and third set of contexts the contexts to use for CABAC encoding and decoding based on the scan order. In some examples, the video encoder and the video decoder determine which contexts to use for CABAC encoding and decoding based on the scan order and a size of the block.
  • For example, if the block is 8×8, then the video encoder and the video decoder determine contexts from a fourth, fifth, and sixth set of contexts (one for each scan order) based on the scan order. If the block is 16×16, then the video encoder and the video decoder determine contexts from a seventh, eighth, and ninth set of contexts (one for each scan order) based on the scan order, and so forth. Similar to above, in some examples, there may be context sharing for the different sized blocks.
  • There may be variants of the above example techniques. For example, in one case, for a particular sized block (e.g., 4×4), the video encoder and video decoder determine contexts that are the same for all scan orders, but for an 8×8 sized block, the video encoder and the video determine the contexts that are the same for a horizontal scan and a vertical scan (e.g., for transform coefficients in particular positions), and different contexts for the diagonal scan. As another example, for larger sized blocks (e.g., 16×16 and 32×32), the video encoder and the video decoder may determine contexts that are the same for all scan orders and for both sizes. In some examples, for the 16×16 and 32×32 blocks, horizontal and vertical scans may not be applied. Other such permutations and combinations are possible, and are contemplated by this disclosure.
  • Determining which contexts to use for CABAC encoding and decoding based on the scan order may better account for the magnitudes of the transform coefficients. For example, the scan order defines the arrangement of the transform coefficients. As one example, the magnitude of the first transform coefficient (referred to as the DC coefficient) is generally the highest. The magnitude of the second transform coefficient is the next highest (on average, but not necessarily), and so forth. However, the location of the second transform coefficient is based on the scan order. For example, in FIG. 1A, the second transform coefficient is the transform coefficient immediately to the right of the first transform coefficient (i.e., immediately right of transform coefficient 12A). However, in FIGS. 1B and 1C, the second transform coefficient is the transform coefficient immediately below the first transform coefficient (i.e., immediately below transform coefficient 16A in FIG. 1B and immediately below transform coefficient 20A in FIG. 1C).
  • In this way, the significance statistics for a transform coefficient in a particular scan position may vary depending on the scan order. For example, in FIG. 1A, for the horizontal scan, the last transform coefficient in the first row may have much higher magnitude (on average) compared to the same transform coefficient in the vertical scan of FIG. 1B or the diagonal scan of FIG. 1C.
  • By determining which contexts to use based on the scan order, the video encoder and the video decoder may be configured to better CABAC encode or CABAC decode as compared to other techniques that do not account for the scan order. For example, it may be possible that the encoding and decoding of the significance syntax elements (e.g., significance flags) for 4×4 and 8×8 blocks is position based. For instance, there is a separate context for each position in a 4×4 block and a separate context for each 2×2 sub-block of an 8×8 block.
  • However, in this case, the context is based on the location of the transform coefficient, irrespective of the actual scan order (i.e., position based contexts for 4×4 and 8×8 blocks do not distinguish between the various scans). For example, the context for a transform coefficient located at (i, j) in the block is the same for the horizontal, vertical, and diagonal scans. As described above, the scan order may have an effect on the significance statistics for the transform coefficients, and the techniques described in this disclosure may determine contexts based on the scan order to account for the significance statistics.
  • As described above, in some examples, the video encoder and the video decoder may determine contexts that are the same for two or more scan orders. There may be various ways in which the video encoder and the video decoder may determine contexts that are the same for two or more scan orders for particular locations of transform coefficients. As one example, the horizontal and the vertical scan orders share the contexts for a particular block size by sharing contexts between the horizontal scan and a transpose of the block of the vertical scan. For instance, the video encoder and the video decoder may determine the same context for a transform coefficient (i, j) for the horizontal scan and a transform coefficient (j, i) for a vertical scan for a particular block size.
  • This instance is one example of where transform coefficients at a particular position share contexts for different scan orders. For example, the context for the transform coefficient at position (i, j) for a horizontal scan and the context for the transform coefficient at position (j, i) for a vertical scan may be the same context. In some examples, the sharing of the contexts may be applicable for 8×8 sized blocks of transform coefficients. Also, in some examples, if the scan order is not horizontal or vertical (e.g., diagonal), the context for position (i, j) and/or (j, i) may be different than for the shared context for horizontal and vertical scan.
  • However, the techniques described in this disclosure are not so limited, and should not be considered limited to examples where the contexts for a transform coefficient (i, j) for the horizontal scan and a transform coefficient (j, i) for a vertical scan for a particular block size are the same. The following is another example manner in which the contexts for transform coefficients at particular positions are shared for different scan orders.
  • For instance, the contexts for the fourth (last) row of the block, for the horizontal scan, may be same as the contexts for the fourth (last) column of the block, for the vertical scan, the contexts for the third row of the block, for the horizontal scan, may be the same the contexts for the third column of the block, for the vertical scan, the contexts for the second row of the block, for the horizontal scan, may be the same the contexts for the second column of the block, for the vertical scan, and the contexts for the first row of the block, for the horizontal scan, may be the same the contexts for the first column of the block, for the vertical scan. The same may be applied to 8×8 blocks. There may be other example ways for the video encoder and the video decoder to determine contexts that are the same for two or more of the scan orders.
  • In some examples, it may be possible for contexts to be shared between different block sizes (e.g., shared between a 4×4 block and an 8×8 block). As an example, the context for transform coefficient (1, 1) in a 4×4 block and the context for transform coefficients (2, 2), (2, 3), (3, 2), and (3, 3) in an 8×8 block may be the same, and in some examples, may be the same for a particular scan order.
  • FIG. 2 is a conceptual diagram illustrating a mapping of transform coefficients to significance syntax elements. For example, the left side of FIG. 2 illustrates transform coefficients values and the right side of FIG. 2 illustrates corresponding significance syntax elements. For all transform coefficients whose values are non-zero, there is a corresponding significance syntax element (e.g., significance flag) with a value of 1. For all transform coefficients whose values are 0, there is a corresponding significance syntax element (e.g., significance flag) with a value of 0. In the examples described in this disclosure, the video encoder and the video decoder are configured to CABAC encode and CABAC decode the example significance syntax elements illustrated in FIG. 2 by determining contexts based on the scan order, and in some examples, also based on positions of the transform coefficients and the size of the block.
  • FIG. 3 is a block diagram illustrating an example video encoding and decoding system 22 that may be configured to assign contexts utilizing the techniques described in this disclosure. As shown in FIG. 3, system 22 includes a source device 24 that generates encoded video data to be decoded at a later time by a destination device 26. Source device 24 and destination device 26 may comprise any of a wide range of devices, including desktop computers, notebook (i.e., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called “smart” phones, so-called “smart” pads, televisions, cameras, display devices, digital media players, video gaming consoles, video streaming device, or the like. In some cases, source device 24 and destination device 26 may be equipped for wireless communication.
  • Destination device 26 may receive the encoded video data to be decoded via a link 28. Link 28 may comprise any type of medium or device capable of moving the encoded video data from source device 24 to destination device 26. In one example, link 28 may comprise a communication medium to enable source device 24 to transmit encoded video data directly to destination device 26 in real-time. The encoded video data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to destination device 26. The communication medium may comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network, such as a local area network, a wide-area network, or a global network such as the Internet. The communication medium may include routers, switches, base stations, or any other equipment that may be useful to facilitate communication from source device 24 to destination device 26.
  • Alternatively, encoded data may be output from output interface 34 to a storage device 38. Similarly, encoded data may be accessed from storage device 38 by input interface 40. Storage device 38 may include any of a variety of distributed or locally accessed data storage media such as a hard drive, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storing encoded video data. In a further example, storage device 38 may correspond to a file server or another intermediate storage device that may hold the encoded video generated by source device 24. Destination device 26 may access stored video data from storage device 38 via streaming or download. The file server may be any type of server capable of storing encoded video data and transmitting that encoded video data to the destination device 26. Example file servers include a web server (e.g., for a website), an FTP server, network attached storage (NAS) devices, or a local disk drive. Destination device 26 may access the encoded video data through any standard data connection, including an Internet connection. This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both that is suitable for accessing encoded video data stored on a file server. The transmission of encoded video data from storage device 38 may be a streaming transmission, a download transmission, or a combination of both.
  • The techniques of this disclosure are not necessarily limited to wireless applications or settings. The techniques may be applied to video coding in support of any of a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, streaming video transmissions, e.g., via the Internet, encoding of digital video for storage on a data storage medium, decoding of digital video stored on a data storage medium, or other applications. In some examples, system 22 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.
  • In the example of FIG. 3, source device 24 includes a video source 30, video encoder 32 and an output interface 34. In some cases, output interface 34 may include a modulator/demodulator (modem) and/or a transmitter. In source device 24, video source 30 may include a source such as a video capture device, e.g., a video camera, a video archive containing previously captured video, a video feed interface to receive video from a video content provider, and/or a computer graphics system for generating computer graphics data as the source video, or a combination of such sources. As one example, if video source 30 is a video camera, source device 24 and destination device 26 may form so-called camera phones or video phones. However, the techniques described in this disclosure may be applicable to video coding in general, and may be applied to wireless and/or wired applications.
  • The captured, pre-captured, or computer-generated video may be encoded by video encoder 32. The encoded video data may be transmitted directly to destination device 26 via output interface 34 of source device 24. The encoded video data may also (or alternatively) be stored onto storage device 38 for later access by destination device 26 or other devices, for decoding and/or playback.
  • Destination device 26 includes an input interface 40, a video decoder 42, and a display device 44. In some cases, input interface 40 may include a receiver and/or a modem. Input interface 40 of destination device 26 receives the encoded video data over link 28. The encoded video data communicated over link 28, or provided on storage device 38, may include a variety of syntax elements generated by video encoder 32 for use by a video decoder, such as video decoder 42, in decoding the video data. Such syntax elements may be included with the encoded video data transmitted on a communication medium, stored on a storage medium, or stored a file server.
  • Display device 44 may be integrated with, or external to, destination device 26. In some examples, destination device 26 may include an integrated display device and also be configured to interface with an external display device. In other examples, destination device 26 may be a display device. In general, display device 44 displays the decoded video data to a user, and may comprise any of a variety of display devices such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or another type of display device.
  • Video encoder 32 and video decoder 42 may operate according to a video compression standard, such as the ITU-T H.264 standard, alternatively referred to as MPEG-4, Part 10, Advanced Video Coding (AVC), or extensions of such standards. Alternatively, video encoder 32 and video decoder 42 may operate according to other proprietary or industry standards, such as the High Efficiency Video Coding (HEVC) standard, and may conform to the HEVC Test Model (HM). The techniques of this disclosure, however, are not limited to any particular coding standard. Other examples of video compression standards include MPEG-2 and ITU-T H.263.
  • Although not shown in FIG. 3, in some aspects, video encoder 32 and video decoder 42 may each be integrated with an audio encoder and decoder, and may include appropriate MUX-DEMUX units, or other hardware and software, to handle encoding of both audio and video in a common data stream or separate data streams. If applicable, in some examples, MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol, or other protocols such as the user datagram protocol (UDP).
  • Video encoder 32 and video decoder 42 each may be implemented as any of a variety of suitable encoder circuitry, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. When the techniques are implemented partially in software, a device may store instructions for the software in a suitable, computer-readable storage medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of video encoder 32 and video decoder 42 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in a respective device. For example, the device that includes video decoder 42 may be microprocessor, an integrated circuit (IC), or a wireless communication device that includes video decoder 42.
  • The JCT-VC is working on development of the HEVC standard. The HEVC standardization efforts are based on an evolving model of a video coding device referred to as the HEVC Test Model (HM). The HM presumes several additional capabilities of video coding devices relative to existing devices according to, e.g., ITU-T H.264/AVC. For example, whereas H.264 provides nine intra-prediction encoding modes, the HM may provide as many as thirty-five intra-prediction encoding modes.
  • In general, the working model of the HM describes that a video frame or picture may be divided into a sequence of treeblocks or largest coding units (LCU) that include both luma and chroma samples. A treeblock has a similar purpose as a macroblock of the H.264 standard. A slice includes a number of consecutive treeblocks in coding order. A video frame or picture may be partitioned into one or more slices. Each treeblock may be split into coding units (CUs) according to a quadtree. For example, a treeblock, as a root node of the quadtree, may be split into four child nodes, and each child node may in turn be a parent node and be split into another four child nodes. A final, unsplit child node, as a leaf node of the quadtree, comprises a coding node, i.e., a coded video block. Syntax data associated with a coded bitstream may define a maximum number of times a treeblock may be split, and may also define a minimum size of the coding nodes.
  • A CU includes a coding node and prediction units (PUs) and transform units (TUs) associated with the coding node. As described above, a transform unit includes one or more transform blocks, and the techniques described in this disclosure are related to determining contexts for the significance syntax elements for the transform coefficients of a transform block based on a scan order and, in some examples, based on a scan order and size of the transform block. A size of the CU corresponds to a size of the coding node and must be square in shape. The size of the CU may range from 8×8 pixels up to the size of the treeblock with a maximum of 64×64 pixels or greater. Each CU may contain one or more PUs and one or more TUs. Syntax data associated with a CU may describe, for example, partitioning of the CU into one or more PUs. Partitioning modes may differ between whether the CU is skip or direct mode encoded, intra-prediction mode encoded, or inter-prediction mode encoded. PUs may be partitioned to be non-square in shape. Syntax data associated with a CU may also describe, for example, partitioning of the CU into one or more TUs according to a quadtree.
  • A TU can be square or non-square in shape. Again, a TU includes one or more transform blocks (TBs) (e.g., one TB for the luma samples, one TB for the first chroma samples, and one TB for the second chroma samples). In this sense, a TU can be considered conceptually as including these TBs, and these TBs can be square or non-square in shape. For example, in this disclosure, the term TU is used to generically refer to the TBs, and the example techniques described in this disclosure are described with respect to a TB.
  • The HEVC standard allows for transformations according to TUs, which may be different for different CUs. The TUs are typically sized based on the size of PUs within a given CU defined for a partitioned LCU, although this may not always be the case. The TUs are typically the same size or smaller than the PUs. In some examples, residual samples corresponding to a CU may be subdivided into smaller units using a quadtree structure known as “residual quad tree” (RQT). The leaf nodes of the RQT may be referred to as transform units (TUs). Pixel difference values associated with the TUs may be transformed to produce transform coefficients, which may be quantized.
  • In general, a PU includes data related to the prediction process. For example, when the PU is intra-mode encoded (intra-prediction encoded), the PU may include data describing an intra-prediction mode for the PU. As another example, when the PU is inter-mode encoded (inter-prediction encoded), the PU may include data defining a motion vector for the PU. The data defining the motion vector for a PU may describe, for example, a horizontal component of the motion vector, a vertical component of the motion vector, a resolution for the motion vector (e.g., one-quarter pixel precision or one-eighth pixel precision), a reference picture to which the motion vector points, and/or a reference picture list (e.g., List 0 (L0) or List 1 (L1)) for the motion vector.
  • In general, a TU is used for the transform and quantization processes. A given CU having one or more PUs may also include one or more transform units (TUs). The TUs include one or more transform blocks (TBs). Blocks 10, 14, and 18 of FIGS. 1A-1C, respectively, are examples of TBs. Following prediction, video encoder 32 may calculate residual values corresponding to the PU. The residual values comprise pixel difference values that may be transformed into transform coefficients, quantized, and scanned using the TBs to produce serialized transform coefficients for entropy coding. This disclosure typically uses the term “video block” to refer to a coding node of a CU. In some specific cases, this disclosure may also use the term “video block” to refer to a treeblock, i.e., LCU, or a CU, which includes a coding node and PUs. The term “video block” may also refer to transform blocks of a TU.
  • For example, for video coding according to the high efficiency video coding (HEVC) standard currently under development, a video picture may be partitioned into coding units (CUs), prediction units (PUs), and transform units (TUs). A CU generally refers to an image region that serves as a basic unit to which various coding tools are applied for video compression. A CU typically has a square geometry, and may be considered to be similar to a so-called “macroblock” under other video coding standards, such as, for example, ITU-T H.264.
  • To achieve better coding efficiency, a CU may have a variable size depending on the video data it contains. That is, a CU may be partitioned, or “split” into smaller blocks, or sub-CUs, each of which may also be referred to as a CU. In addition, each CU that is not split into sub-CUs may be further partitioned into one or more PUs and TUs for purposes of prediction and transform of the CU, respectively.
  • PUs may be considered to be similar to so-called partitions of a block under other video coding standards, such as H.264. PUs are the basis on which prediction for the block is performed to produce “residual” coefficients. Residual coefficients of a CU represent a difference between video data of the CU and predicted data for the CU determined using one or more PUs of the CU. Specifically, the one or more PUs specify how the CU is partitioned for the purpose of prediction, and which prediction mode is used to predict the video data contained within each partition of the CU.
  • One or more TUs of a CU specify partitions of a block of residual coefficients of the CU on the basis of which a transform is applied to the block to produce a block of residual transform coefficients for the CU. The one or more TUs may also be associated with the type of transform that is applied. The transform converts the residual coefficients from a pixel, or spatial domain to a transform domain, such as a frequency domain. In addition, the one or more TUs may specify parameters on the basis of which quantization is applied to the resulting block of residual transform coefficients to produce a block of quantized residual transform coefficients. The residual transform coefficients may be quantized to possibly reduce the amount of data used to represent the coefficients.
  • A CU generally includes one luminance component, denoted as Y, and two chrominance components, denoted as U and V. In other words, a given CU that is not further split into sub-CUs may include Y, U, and V components, each of which may be further partitioned into one or more PUs and TUs for purposes of prediction and transform of the CU, as previously described. For example, depending on the video sampling format, the size of the U and V components, in terms of a number of samples, may be the same as or different than the size of the Y component. As such, the techniques described above with reference to prediction, transform, and quantization may be performed for each of the Y, U, and V components of a given CU.
  • To encode a CU, one or more predictors for the CU are first derived based on one or more PUs of the CU. A predictor is a reference block that contains predicted data for the CU, and is derived on the basis of a corresponding PU for the CU, as previously described. For example, the PU indicates a partition of the CU for which predicted data is to be determined, and a prediction mode used to determine the predicted data. The predictor can be derived either through intra-(I) prediction (i.e., spatial prediction) or inter-(P or B) prediction (i.e., temporal prediction) modes. Hence, some CUs may be intra-coded (I) using spatial prediction with respect to neighboring reference blocks, or CUs, in the same frame, while other CUs may be inter-coded (P or B) with respect to reference blocks, or CUs, in other frames.
  • Upon identification of the one or more predictors based on the one or more PUs of the CU, a difference between the original video data of the CU corresponding to the one or more PUs and the predicted data for the CU contained in the one or more predictors is calculated. This difference, also referred to as a prediction residual, comprises residual coefficients, and refers to pixel differences between portions of the CU specified by the one or more PUs and the one or more predictors, as previously described. The residual coefficients are generally arranged in a two-dimensional (2-D) array that corresponds to the one or more PUs o the CU.
  • To achieve further compression, the prediction residual is generally transformed, e.g., using a discrete cosine transform (DCT), integer transform, Karhunen-Loeve (K-L) transform, or another transform. The transform converts the prediction residual, i.e., the residual coefficients, in the spatial domain to residual transform coefficients in the transform domain, e.g., a frequency domain, as also previously described. In some occasions the transform is skipped, i.e., no transform is applied to the prediction residual. Transform skipped coefficients are also referred as transform coefficients. The transform coefficients (including transform skip coefficients) are also generally arranged in a 2-D array that corresponds to the one or more TUs of the CU. For further compression, the residual transform coefficients may be quantized to possibly reduce the amount of data used to represent the coefficients, as also previously described.
  • To achieve still further compression, an entropy coder subsequently encodes the resulting residual transform coefficients, using Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), Probability Interval Partitioning Entropy Coding (PIPE), or another entropy coding methodology. Entropy coding may achieve this further compression by reducing or removing statistical redundancy inherent in the video data of the CU, represented by the coefficients, relative to other CUs.
  • A video sequence typically includes a series of video frames or pictures. A group of pictures (GOP) generally comprises a series of one or more of the video pictures. A GOP may include syntax data in a header of the GOP, a header of one or more of the pictures, or elsewhere, that describes a number of pictures included in the GOP. Each slice of a picture may include slice syntax data that describes an encoding mode for the respective slice. Video encoder 32 typically operates on video blocks within individual video slices in order to encode the video data. A video block may correspond to a coding node within a CU (e.g., a transform block of transform coefficients). The video blocks may have fixed or varying sizes, and may differ in size according to a specified coding standard.
  • As an example, the HM supports prediction in various PU sizes. Assuming that the size of a particular CU is 2N×2N, the HM supports intra-prediction in PU sizes of 2N×2N or N×N, and inter-prediction in symmetric PU sizes of 2N×2N, 2N×N, N×2N, or N×N. The HM also supports asymmetric partitioning for inter-prediction in PU sizes of 2N×nU, 2N×nD, nL×2N, and nR×2N. In asymmetric partitioning, one direction of a CU is not partitioned, while the other direction is partitioned into 25% and 75%. The portion of the CU corresponding to the 25% partition is indicated by an “n” followed by an indication of “Up”, “Down,” “Left,” or “Right.” Thus, for example, “2N×nU” refers to a 2N×2N CU that is partitioned horizontally with a 2N×0.5N PU on top and a 2N×1.5N PU on bottom.
  • In this disclosure, “N×N” and “N by N” may be used interchangeably to refer to the pixel dimensions of a video block in terms of vertical and horizontal dimensions, e.g., 16×16 pixels or 16 by 16 pixels. In general, a 16×16 block will have 16 pixels in a vertical direction (y=16) and 16 pixels in a horizontal direction (x=16). Likewise, an N×N block generally has N pixels in a vertical direction and N pixels in a horizontal direction, where N represents a nonnegative integer value. The pixels in a block may be arranged in rows and columns. Moreover, blocks need not necessarily have the same number of pixels in the horizontal direction as in the vertical direction. For example, blocks may comprise N×M pixels, where M is not necessarily equal to N.
  • Following intra-predictive or inter-predictive encoding using the PUs of a CU, video encoder 32 may calculate residual data for the TUs of the CU. The PUs may comprise pixel data in the spatial domain (also referred to as the pixel domain) and the TUs may comprise coefficients in the transform domain following application of a transform, e.g., a discrete cosine transform (DCT), an integer transform, a wavelet transform, skip transform, or a conceptually similar transform to residual video data. The residual data may correspond to pixel differences between pixels of the unencoded picture and prediction values corresponding to the PUs. Video encoder 32 may form the TUs including the residual data for the CU, and then transform the TUs to produce transform coefficients for the CU.
  • Following any transforms to produce transform coefficients, video encoder 32 may perform quantization of the transform coefficients. Quantization generally refers to a process in which transform coefficients are quantized to possibly reduce the amount of data used to represent the coefficients, providing further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be rounded down to an m-bit value during quantization, where n is greater than m.
  • In some examples, video encoder 32 may utilize a predefined scan order (e.g., horizontal, vertical, or diagonal) to scan the quantized transform coefficients to produce a serialized vector that can be entropy encoded. In some examples, video encoder 32 may perform an adaptive scan. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 32 may entropy encode the one-dimensional vector, e.g., according to context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), Probability Interval Partitioning Entropy (PIPE) coding or another entropy encoding methodology. Video encoder 32 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 42 in decoding the video data.
  • To perform CABAC, video encoder 32 may assign a context within a context model to a symbol to be transmitted. The context may relate to, for example, whether neighboring values of the symbol are non-zero or not. To perform CAVLC, video encoder 32 may select a variable length code for a symbol to be transmitted. Codewords in VLC may be constructed such that relatively shorter codes correspond to more probable symbols, while longer codes correspond to less probable symbols. In this way, the use of VLC may achieve a bit savings over, for example, using equal-length codewords for each symbol to be transmitted. The probability determination may be based on a context assigned to the symbol.
  • Video decoder 42 may be configured to implement the reciprocal of the encoding techniques implemented by video encoder 32. For example, for the encoded significance syntax elements, video decoder 42 may decode the significance syntax elements by determining which contexts to use based on the determined scan order.
  • For instance, video encoder 32 signals syntax elements that indicate the values of the transform coefficients. Video encoder 32 generates these syntax elements in five passes, as one example, and using five passes is not necessary in every example. Video encoder 32 determines the location of the last significant coefficient and begins the first pass from the last significant coefficient. After the first pass, video encoder 32 implements the remaining four passes only on those transform coefficients remaining from the previous pass. In the first pass, video encoder 32 scans the transform coefficients using one of the scan orders illustrated in FIGS. 1A-1C and determines a significance syntax element for each transform coefficient that indicates whether the value for the transform coefficient is zero or non-zero (i.e., insignificant or significant).
  • In the second pass, referred to as a greater than one pass, video encoder 32 generates syntax elements to indicate whether the absolute value of a significant coefficient is larger than one. In a similar manner, in the third pass, referred to as the greater than two pass, video encoder 32 generates syntax elements to indicate whether the absolute value of a greater than one coefficient is larger than two.
  • In the fourth pass, referred to as a sign pass, video encoder 32 generates syntax elements to indicate the sign information for significant coefficients. In the fifth pass, referred to as a coefficient level remaining pass, video encoder 32 generates syntax elements that indicate the remaining absolute value of a transform coefficient level (e.g., the remainder value). The remainder value may be coded as the absolute value of the coefficient minus 3. It should be noted that the five pass approach is just one example technique that may be used for coding transform coefficient and the techniques described herein may be equally applicable to other techniques.
  • In the techniques described in this disclosure, video encoder 32 encodes the significance syntax elements using context adaptive binary arithmetic coding (CABAC). In accordance with the techniques described in this disclosure, video encoder 32 may determine a scan order for the transform coefficients of the block, and determine contexts for the significance syntax elements of the transform coefficients of the block based on the determined scan order. Video encoder 32 may CABAC encode the significance syntax elements based on the determined contexts, and signal the encoded significance syntax elements in the coded bitstream.
  • Video decoder 42 may be configured to perform similar functions. For example, video decoder 42 receives from the coded bitstream significance syntax elements of transform coefficients of a block. Video decoder 42 may determine a scan order for the transform coefficients of the block (e.g., an order in which video encoder 32 scanned the transform coefficients). Video decoder 42 may then CABAC decode the significance syntax elements of the transform coefficients based at least one the determined contexts.
  • In some examples, video encoder 32 and video decoder 42 each determines the contexts that are the same if the determined scan order is a horizontal scan and if the determined scan order is a vertical scan, and determines the contexts, which are different than the contexts for the horizontal scan and vertical scan, if the determined scan order is a diagonal scan. In general, video encoder 32 and video decoder 42 may each determine a first set of contexts for the significance syntax elements if the scan order is a first scan order, and determine a second set of contexts for the significance syntax elements if the scan order is a second scan order. The first set of contexts and the second set of contexts may be same in some cases (e.g., where the first scan order is a horizontal scan and the second scan order is a vertical scan, or vice-versa). The first set of contexts and the second set of contexts may be different in some cases (e.g., where the first scan order is either a horizontal or a vertical scan and the second scan order is not a horizontal or a vertical scan).
  • In some examples, video encoder 32 and video decoder 42 also determine a size of the block. In some of these examples, video encoder 32 and video decoder 42 determine the contexts for the significance syntax elements based on the determined scan order and based on the determined size of the block. For example, to determine the contexts, video encoder 32 and video decoder 42 may determine, based on the size of the block, that the contexts for the significance syntax elements of the transform coefficients that are the same for all scan orders. In other words, for certain sized blocks, video encoder 32 and video decoder 42 may determine contexts that are the same for all scan orders.
  • In some examples, the techniques described in this disclosure may build upon the concepts of sub-block horizontal and vertical scans, such as those described in: (1) Rosewarne, C., Maeda, M. “Non-CE11: Harmonisation of 8×8 TU residual scan” JCT-VC Contribution JCTVC-H0145; (2) Yu, Y., Panusopone, K., Lou, J., Wang, L. “Adaptive Scan for Large Blocks for HEVC; JCT-VC Contribution JCTVC-F569; and (3) U.S. patent application Ser. No. 13/551,458, filed Jul. 17, 2012. For instance, the techniques described in this disclosure provide for improvement in the coding of significance syntax elements and harmonization across different scan orders and block (e.g., TU) sizes.
  • For example, as described above, a 4×4 block may be a sub-block of a larger block. In the techniques described in this disclosure, relatively large sized blocks (e.g., 16×16 or 32×32) may be divided into 4×4 sub-blocks, and video encoder 32 and video decoder 42 may be configured to determine the contexts for the 4×4 sub-blocks based on the scan order. In some examples, such techniques may be extendable to 8×8 sized blocks as well as for all scan orders (i.e., the 4×4 sub-blocks of the 8×8 block can be scanned horizontally, vertically, or diagonally). Such techniques may also allow for context sharing between the different scan orders.
  • In some examples, video encoder 32 and video decoder 42 determine contexts that are the same for all block sizes if the scan order is a diagonal scan (i.e., the contexts are shared for all of the TUs when using the diagonal scan). In this example, video encoder 32 and video decoder 42 may determine another set of contexts that are the same for the horizontal and vertical scan, which allows for context sharing depending on the scan order.
  • In some examples, there may be three sets of contexts: one for relatively large blocks, one for the diagonal scan of the 8×8 block or the 4×4 block, and one for both horizontal and vertical scans of the 8×8 block or the 4×4 block, where the contexts for the 8×8 block and the 4×4 block are different. Other combinations and permutations of the sizes and the scan orders may be possible, and video encoder 32 and video decoder 42 may be configured to determine contexts that are the same for these various combinations and permutations of sizes and scan orders.
  • FIG. 4 is a block diagram illustrating an example video encoder 32 that may implement the techniques described in this disclosure. In the example of FIG. 4, video encoder 32 includes a mode select unit 46, prediction processing unit 48, reference picture memory 70, summer 56, transform processing unit 58, quantization processing unit 60, and entropy encoding unit 62. Prediction processing unit 48 includes motion estimation unit 50, motion compensation unit 52, and intra prediction unit 54. For video block reconstruction, video encoder 32 also includes inverse quantization processing unit 64, inverse transform processing unit 66, and summer 68. A deblocking filter (not shown in FIG. 4) may also be included to filter block boundaries to remove blockiness artifacts from reconstructed video. If desired, the deblocking filter would typically filter the output of summer 68. Additional loop filters (in loop or post loop) may also be used in addition to the deblocking filter. It should be noted that prediction processing unit 48 and transform processing unit 58 should not be confused with PUs and TUs as described above.
  • As shown in FIG. 4, video encoder 32 receives video data, and mode select unit 46 partitions the data into video blocks. This partitioning may also include partitioning into slices, tiles, or other larger units, as well as video block partitioning, e.g., according to a quadtree structure of LCUs and CUs. Video encoder 32 generally illustrates the components that encode video blocks within a video slice to be encoded. A slice may be divided into multiple video blocks (and possibly into sets of video blocks referred to as tiles). Prediction processing unit 48 may select one of a plurality of possible coding modes, such as one of a plurality of intra coding modes or one of a plurality of inter coding modes, for the current video block based on error results (e.g., coding rate and the level of distortion). Prediction processing unit 48 may provide the resulting intra- or inter-coded block to summer 56 to generate residual block data and to summer 68 to reconstruct the encoded block for use as a reference picture.
  • Intra prediction unit 54 within prediction processing unit 48 may perform intra-predictive coding of the current video block relative to one or more neighboring blocks in the same frame or slice as the current block to be coded to provide spatial compression. Motion estimation unit 50 and motion compensation unit 52 within prediction processing unit 48 perform inter-predictive coding of the current video block relative to one or more predictive blocks in one or more reference pictures to provide temporal compression.
  • Motion estimation unit 50 may be configured to determine the inter-prediction mode for a video slice according to a predetermined pattern for a video sequence. The predetermined pattern may designate video slices in the sequence as P slices or B slices. Motion estimation unit 50 and motion compensation unit 52 may be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation, performed by motion estimation unit 50, is the process of generating motion vectors, which estimate motion for video blocks. A motion vector, for example, may indicate the displacement of a PU of a video block within a current video frame or picture relative to a predictive block within a reference picture.
  • A predictive block is a block that is found to closely match the PU of the video block to be coded in terms of pixel difference, which may be determined by sum of absolute difference (SAD), sum of square difference (SSD), or other difference metrics. In some examples, video encoder 32 may calculate values for sub-integer pixel positions of reference pictures stored in reference picture memory 70. For example, video encoder 32 may interpolate values of one-quarter pixel positions, one-eighth pixel positions, or other fractional pixel positions of the reference picture. Therefore, motion estimation unit 50 may perform a motion search relative to the full pixel positions and fractional pixel positions and output a motion vector with fractional pixel precision.
  • Motion estimation unit 50 calculates a motion vector for a PU of a video block in an inter-coded slice by comparing the position of the PU to the position of a predictive block of a reference picture. The reference picture may be selected from a first reference picture list (List 0) or a second reference picture list (List 1), each of which identify one or more reference pictures stored in reference picture memory 70. Motion estimation unit 50 sends the calculated motion vector to entropy encoding unit 62 and motion compensation unit 52.
  • Motion compensation, performed by motion compensation unit 52, may involve fetching or generating the predictive block based on the motion vector determined by motion estimation, possibly performing interpolations to sub-pixel precision. Upon receiving the motion vector for the PU of the current video block, motion compensation unit 52 may locate the predictive block to which the motion vector points in one of the reference picture lists. Video encoder 32 forms a residual video block by subtracting pixel values of the predictive block from the pixel values of the current video block being coded, forming pixel difference values. The pixel difference values form residual data for the block, and may include both luma and chroma difference components. Summer 56 represents the component or components that perform this subtraction operation. Motion compensation unit 52 may also generate syntax elements associated with the video blocks and the video slice for use by video decoder 42 in decoding the video blocks of the video slice.
  • Intra-prediction unit 54 may intra-predict a current block, as an alternative to the inter-prediction performed by motion estimation unit 50 and motion compensation unit 52, as described above. In particular, intra-prediction unit 54 may determine an intra-prediction mode to use to encode a current block. In some examples, intra-prediction unit 54 may encode a current block using various intra-prediction modes, e.g., during separate encoding passes, and intra-prediction unit 54 (or mode select unit 46, in some examples) may select an appropriate intra-prediction mode to use from the tested modes. For example, intra-prediction unit 54 may calculate rate-distortion values using a rate-distortion analysis for the various tested intra-prediction modes, and select the intra-prediction mode having the best rate-distortion characteristics among the tested modes. Rate-distortion analysis generally determines an amount of distortion (or error) between an encoded block and an original, unencoded block that was encoded to produce the encoded block, as well as a bit rate (that is, a number of bits) used to produce the encoded block. Intra-prediction unit 54 may calculate ratios from the distortions and rates for the various encoded blocks to determine which intra-prediction mode exhibits the best rate-distortion value for the block.
  • In any case, after selecting an intra-prediction mode for a block, intra-prediction unit 54 may provide information indicative of the selected intra-prediction mode for the block to entropy encoding unit 62. Entropy encoding unit 62 may encode the information indicating the selected intra-prediction mode in accordance with the entropy techniques described herein.
  • After prediction processing unit 48 generates the predictive block for the current video block via either inter-prediction or intra-prediction, video encoder 32 forms a residual video block by subtracting the predictive block from the current video block. The residual video data in the residual block may be included in one or more TBs and applied to transform processing unit 58. Transform processing unit 58 may transform the residual video data into residual transform coefficients using a transform, such as a discrete cosine transform (DCT) or a conceptually similar transform. Transform processing unit 58 may convert the residual video data from a pixel domain to a transform domain, such as a frequency domain. In some cases, transform processing unit 58 may apply a 2-dimensional (2-D) transform (in both the horizontal and vertical direction) to the residual data in the TBs. In some examples, transform processing unit 58 may instead apply a horizontal 1-D transform, a vertical 1-D transform, or no transform to the residual data in each of the TBs.
  • Transform processing unit 58 may send the resulting transform coefficients to quantization processing unit 60. Quantization processing unit 60 quantizes the transform coefficients to further reduce the bit rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The degree of quantization may be modified by adjusting a quantization parameter. In some examples, quantization processing unit 60 may then perform a scan of the matrix including the quantized transform coefficients. Alternatively, entropy encoding unit 62 may perform the scan.
  • As described above, the scan performed on a transform block may be based on the size of the transform block. Quantization processing unit 60 and/or entropy encoding unit 62 may scan 8×8, 16×16, and 32×32 transform blocks using any combination of the sub-block scans described above with respect to FIGS. 1A-1C. When more one than one scan is available for a transform block, entropy encoding unit 62 may determine a scan order based on a coding parameter associated with the transform block, such as a prediction mode associated with a prediction unit corresponding to the transform block. Further details with respect to entropy encoding unit 62 are described below with respect to FIG. 5.
  • Inverse quantization processing unit 64 and inverse transform processing unit 66 apply inverse quantization and inverse transformation, respectively, to reconstruct the residual block in the pixel domain for later use as a reference block of a reference picture. Motion compensation unit 52 may calculate a reference block by adding the residual block to a predictive block of one of the reference pictures within one of the reference picture lists. Motion compensation unit 52 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for use in motion estimation. Summer 68 adds the reconstructed residual block to the motion compensated prediction block produced by motion compensation unit 52 to produce a reference block for storage in reference picture memory 70. The reference block may be used by motion estimation unit 50 and motion compensation unit 52 as a reference block to inter-predict a block in a subsequent video frame or picture.
  • Following quantization, entropy encoding unit 62 entropy encodes the quantized transform coefficients. For example, entropy encoding unit 62 may perform context adaptive variable length coding (CAVLC), context adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) coding or another entropy encoding methodology or technique. Following the entropy encoding by entropy encoding unit 62, the encoded bitstream may be transmitted to video decoder 42, or archived for later transmission or retrieval by video decoder 42. Entropy encoding unit 62 may also entropy encode the motion vectors and the other syntax elements for the current video slice being coded. Entropy encoding unit 62 may entropy encode syntax elements such as the significance syntax elements and the other syntax elements for the transform coefficients described above using CABAC.
  • In some examples, entropy encoding unit 62 may be configured to implement the techniques described in this disclosure of determining contexts based on a determined scan order. In some examples, entropy encoding unit 62 in conjunction with one or more units within video encoder 32 may be configured to implement the techniques described in this disclosure. In some examples, a processor or processing unit (not shown) of video encoder 32 may be configured to implement the techniques described in this disclosure.
  • FIG. 5 is a block diagram that illustrates an example entropy encoding unit 62 that may implement the techniques described in this disclosure. The entropy encoding unit 62 illustrated in FIG. 5 may be a CABAC encoder. The example entropy encoding unit 62 may include a binarization unit 72, an arithmetic encoding unit 80, which includes a bypass encoding engine 74 and a regular encoding engine 78, and a context modeling unit 76.
  • Entropy encoding unit 62 may receive one or more syntax elements, such as the significance syntax element, referred to as a significant coefficient_flag in HEVC, the greater than 1 flag, referred to as a coeff_abs_level_greater1 flag in HEVC, the greater than 2 flag, referred to as coeff_abs_level_greater2 flag in HEVC, the sign flag referred to as coeff_sign_flag in HEVC, and the level syntax element referred to as coeff_abs_level_remain. Binarization unit 72 receives a syntax element and produces a bin string (i.e., binary string). Binarization unit 72 may use, for example, any one or combination of the following techniques to produce a bin string: fixed length coding, unary coding, truncated unary coding, truncated Rice coding, Golomb coding, exponential Golomb coding, and Golomb-Rice coding. Further, in some cases, binarization unit 72 may receive a syntax element as a binary string and simply pass-through the bin values. In one example, binarization unit 72 receives the significance syntax element and produces a bin string.
  • Arithmetic encoding unit 80 is configured to receive a bin string from binarization unit 72 and perform arithmetic encoding on the bin string. As shown in FIG. 5, arithmetic encoding unit 80 may receive bin values from a bypass path or the regular coding path. Bin values that follow the bypass path may be bins values identified as bypass coded and bin values that follow the regular encoding path may be identified as CABAC-coded. Consistent with the CABAC process described above, in the case where arithmetic encoding unit 80 receives bin values from a bypass path, bypass encoding engine 74 may perform arithmetic encoding on bin values without utilizing an adaptive context assigned to a bin value. In one example, bypass encoding engine 74 may assume equal probabilities for possible values of a bin.
  • In the case where arithmetic encoding unit 80 receives bin values through the regular path, context modeling unit 76 may provide a context variable (e.g., a context state), such that regular encoding engine 78 may perform arithmetic encoding based on the context assignments provided by context modeling unit 76. The context assignments may be defined according to a video coding standard, such as the HEVC standard. Further, in one example context modeling unit 76 and/or entropy encoding unit 62 may be configured to determine contexts for bins of the significance syntax elements based on techniques described herein. The techniques may be incorporated into HEVC or another video coding standard. The context models may be stored in memory. Context modeling unit 76 may include a series of indexed tables and/or utilize mapping functions to determine a context and a context variable for a particular bin. After encoding a bin value, regular encoding engine 78 may update a context based on the actual bin values.
  • FIG. 6 is a flowchart illustrating an example process for encoding video data according to this disclosure. Although the process in FIG. 6 is described below as generally being performed by video encoder 32, the process may be performed by any combination of video encoder 32, entropy encoding unit 62, and/or context modeling unit 76.
  • As illustrated, video encoder 32 may determine a scan order for transform coefficients of a block (82). Video encoder 32 may determine contexts for the transform coefficients based on the scan order (84). In some examples, video encoder 32 determines the contexts based on the determined scan order, positions of the transform coefficients with the block, and a size of the block. For example, for a particular block size (e.g., an 8×8 block of transform coefficients) and a particular position (e.g., transform coefficient position), video encoder 32 may determine the same context if the scan order is either horizontal scan or vertical scan, and determine a different context if the scan order in not the horizontal scan or the vertical scan.
  • Video encoder 32 may CABAC encode significance syntax elements (e.g., significance flags) for the transform coefficients based on the determined contexts (86). Video encoder 32 may signal the encoded significance syntax elements (e.g., significance flags) (88).
  • FIG. 7 is a block diagram illustrating an example video decoder 42 that may implement the techniques described in this disclosure. In the example of FIG. 7, video decoder 42 includes an entropy decoding unit 90, prediction processing unit 92, inverse quantization processing unit 98, inverse transform processing unit 100, summer 102, and reference picture memory 104. Prediction processing unit 92 includes motion compensation unit 94 and intra prediction unit 96. Video decoder 42 may, in some examples, perform a decoding pass generally reciprocal to the encoding pass described with respect to video encoder 32 from FIG. 4.
  • During the decoding process, video decoder 42 receives an encoded video bitstream that represents video blocks of an encoded video slice and associated syntax elements from video encoder 32. Entropy decoding unit 90 of video decoder 42 entropy decodes the bitstream to generate quantized coefficients, motion vectors, and other syntax elements. Entropy decoding unit 90 forwards the motion vectors and other syntax elements to prediction processing unit 92. Video decoder 42 may receive the syntax elements at the video slice level and/or the video block level.
  • In some examples, entropy decoding unit 90 may be configured to implement the techniques described in this disclosure of determining contexts based on a determined scan order. In some examples, entropy decoding unit 90 in conjunction with one or more units within video decoder 42 may be configured to implement the techniques described in this disclosure. In some examples, a processor or processing unit (not shown) of video decoder 42 may be configured to implement the techniques described in this disclosure.
  • FIG. 8 is a block diagram that illustrates an example entropy decoding unit 90 that may implement the techniques described in this disclosure. Entropy decoding unit 90 receives an entropy encoded bitstream and decodes syntax elements from the bitstream. Syntax elements may include the syntax elements such as significant_coefficient_flag, coeff_abs_level_remain, coeff_abs_level_greater1 flag, coeff_abs_level_greater2 flag, and coeff_sign_flag, syntax elements described above for transform coefficients of a block. The example entropy decoding unit 90 in FIG. 8 includes an arithmetic decoding unit 106, which may include a bypass decoding engine 108 and a regular decoding engine 110. The example entropy decoding unit 90 also includes context modeling unit 112 and inverse binarization unit 114. The example entropy decoding unit 90 may perform the reciprocal functions of the example entropy encoding unit 62 described with respect to FIG. 5. In this manner, entropy decoding unit 90 may perform entropy decoding based on the techniques described in this disclosure.
  • Arithmetic decoding unit 106 receives an encoded bit stream. As shown in FIG. 8, arithmetic decoding unit 106 may process encoded bin values according to a bypass path or the regular coding path. An indication whether an encoded bin value should be processed according to a bypass path or a regular pass may be signaled in the bitstream with higher level syntax. Consistent with the CABAC process described above, in the case where arithmetic decoding unit 106 receives bin values from a bypass path, bypass decoding engine 108 may perform arithmetic encoding on bin values without utilizing a context assigned to a bin value. In one example, bypass decoding engine 108 may assume equal probabilities for possible values of a bin.
  • In the case where arithmetic decoding unit 106 receives bin values through the regular path, context modeling unit 112 may provide a context variable, such that regular decoding engine 110 may perform arithmetic encoding based on the context assignments provided by context modeling unit 112. The context assignments may be defined according to a video coding standard, such as HEVC. The context models may be stored in memory. Context modeling unit 112 may include a series of indexed tables and/or utilize mapping functions to determine a context and a context variable portion of an encoded bitstream. Further, in one example context modeling unit 112 and/or entropy decoding unit 90 may be configured to assign contexts to bins of the significance syntax elements based on techniques described herein. After decoding a bin value, regular decoding engine 110, may update a context based on the decoded bin values. Further, inverse binarization unit 114 may perform an inverse binarization on a bin value and use a bin matching function to determine if a bin value is valid. The inverse binarization unit 114 may also update the context modeling unit based on the matching determination. Thus, the inverse binarization unit 114 outputs syntax elements according to a context adaptive decoding technique.
  • Referring back to FIG. 7, when the video slice is coded as an intra-coded (I) slice, intra prediction unit 96 of prediction processing unit 92 may generate prediction data for a video block of the current video slice based on a signaled intra prediction mode and data from previously decoded blocks of the current frame or picture. When the video frame is coded as an inter-coded (i.e., B or P) slice, motion compensation unit 94 of prediction processing unit 92 produces predictive blocks for a video block of the current video slice based on the motion vectors and other syntax elements received from entropy decoding unit 90. The predictive blocks may be produced from one of the reference pictures within one of the reference picture lists. Video decoder 42 may construct the reference picture lists, List 0 and List 1, using default construction techniques based on reference pictures stored in reference picture memory 104.
  • Motion compensation unit 94 determines prediction information for a video block of the current video slice by parsing the motion vectors and other syntax elements, and uses the prediction information to produce the predictive blocks for the current video block being decoded. For example, motion compensation unit 94 uses some of the received syntax elements to determine a prediction mode (e.g., intra- or inter-prediction) used to code the video blocks of the video slice, an inter-prediction slice type (e.g., B slice or P slice), construction information for one or more of the reference picture lists for the slice, motion vectors for each inter-encoded video block of the slice, inter-prediction status for each inter-coded video block of the slice, and other information to decode the video blocks in the current video slice.
  • Motion compensation unit 94 may also perform interpolation based on interpolation filters. Motion compensation unit 94 may use interpolation filters as used by video encoder 32 during encoding of the video blocks to calculate interpolated values for sub-integer pixels of reference blocks. In this case, motion compensation unit 94 may determine the interpolation filters used by video encoder 32 from the received syntax elements and use the interpolation filters to produce predictive blocks.
  • Inverse quantization processing unit 98 inverse quantizes, i.e., de-quantizes, the quantized transform coefficients provided in the bitstream and decoded by entropy decoding unit 90. The inverse quantization process may include use of a quantization parameter calculated by video encoder 32 for each video block in the video slice to determine a degree of quantization and, likewise, a degree of inverse quantization that should be applied. Inverse transform processing unit 100 applies an inverse transform, e.g., an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process, to the transform coefficients in order to produce residual blocks in the pixel domain.
  • In some cases, inverse transform processing unit 100 may apply a 2-dimensional (2-D) inverse transform (in both the horizontal and vertical direction) to the coefficients. In some examples, inverse transform processing unit 88 may instead apply a horizontal 1-D inverse transform, a vertical 1-D inverse transform, or no transform to the residual data in each of the TUs. The type of transform applied to the residual data at video encoder 32 may be signaled to video decoder 42 to apply an appropriate type of inverse transform to the transform coefficients.
  • After motion compensation unit 94 generates the predictive block for the current video block based on the motion vectors and other syntax elements, video decoder 42 forms a decoded video block by summing the residual blocks from inverse transform processing unit 100 with the corresponding predictive blocks generated by motion compensation unit 94. Summer 102 represents the component or components that perform this summation operation. If desired, a deblocking filter may also be applied to filter the decoded blocks in order to remove blockiness artifacts. Other loop filters (either in the coding loop or after the coding loop) may also be used to smooth pixel transitions, or otherwise improve the video quality. The decoded video blocks in a given frame or picture are then stored in reference picture memory 104, which stores reference pictures used for subsequent motion compensation. Reference picture memory 104 also stores decoded video for later presentation on a display device, such as display device 44 of FIG. 3.
  • FIG. 9 is a flowchart illustrating an example process for decoding video data according to this disclosure. Although the process in FIG. 9 is described below as generally being performed by video decoder 42, the process may be performed by any combination of video decoder 42, entropy decoding unit 90, and/or context modeling unit 112.
  • As illustrated in FIG. 9, video decoder 42 receives, from a coded bitstream, significance syntax elements (e.g., significance flags) for transform coefficients of a block (116). Video decoder 42 determines a scan order for the transform coefficients (118). Video decoder 42 determines contexts for the transform coefficients based on the determined scan order (120). In some examples, video decoder 42 also determines the block size and determines the contexts based on the determined scan order and block size. In some examples, video decoder 42 determines the contexts based on the determined scan order, positions of the transform coefficients with the block, and a size of the block. For example, for a particular block size (e.g., an 8×8 block of transform coefficients) and a particular position (e.g., transform coefficient position), video decoder 42 may determine the same context if the scan order is either horizontal scan or vertical scan, and determine a different context if the scan order in not the horizontal scan or the vertical scan. Video decoder 42 CABAC decodes the significance syntax elements (e.g., significance flags) based on the determined contexts (122).
  • Video encoder 32, as described in the flowchart of FIG. 6, and video decoder 42, as described in the flowchart of FIG. 9, may be configured to implement various other example techniques described in this disclosure. For example, to determine the contexts, video encoder 32 and video decoder 42 may be configured to determine the contexts that are the same if the determined scan order is a horizontal scan and if the determined scan order is a vertical scan, and determine the contexts, which are different than the contexts if the determined scan order is the horizontal scan and if the determined scan order is the vertical scan, if the determined scan order is not the horizontal scan or the vertical scan (e.g., diagonal scan).
  • In some examples, to determine the contexts, video encoder 32 and video decoder 42 may be configured to determine a first set of contexts for the significance syntax elements if the scan order is a first scan order, and determine a second set of contexts for the significance syntax elements if the scan order is a second scan order. In some these examples, the first set of contexts is the same as the second set of contexts if the first scan order is a horizontal scan and the second scan order is a vertical scan. In some of these examples, the first set of contexts is different than the second set of contexts if the first scan order is one of a horizontal scan or a vertical scan and the second scan order is not the horizontal scan or the vertical scan.
  • In some examples, video encoder 32 and video decoder 42 may determine a size of the block. In some of these examples, video encoder 32 and video decoder 42 may determine the contexts based on the scan order and the determined size of the block. As one example, video encoder 32 and video decoder 42 may determine, based on the determined size of the block, the contexts for the significance syntax elements of the transform coefficients that are the same for all scan orders (i.e., for some block sizes, the contexts are the same for all scan orders).
  • For example, video encoder 32 and video decoder 42 may determine whether the size of the block is a first size or a second size. One example of the first size is the 4×4 block, and one example of the second size is the 8×8 block. If the size of the block is the first size (e.g., the 4×4 block), video encoder 32 and video decoder 42 may determine the contexts that are the same for all scan orders (e.g., the contexts that are the same for the diagonal, horizontal, and vertical scans for the 4×4 block). If the size of the block is the second size (e.g., the 8×8 block), video encoder 32 and video decoder 42 may determine the contexts that are different for at least two different scan orders (e.g., the contexts for the diagonal scan of the 8×8 block is different than the contexts for the horizontal or vertical scan of the 8×8 block, but the contexts for the horizontal and vertical scan of the 8×8 block may be the same).
  • The following describes various additional techniques for improving the manner in which transform coefficients are coded, such as transform coefficients resulting from intra-coding, as one example. However, the techniques may be applicable to other examples as well, such as for inter-coding. The following techniques can be used individually or in conjunction with any of the other techniques described in this disclosure. Moreover, the techniques described above may be used in conjunction with any of the following techniques, or may be implemented separately from any of the following techniques.
  • In some examples, video encoder 32 and video decoder 42 may utilize one scan order to determine the location of last significant coefficient. Video encoder 32 and video decoder 42 may utilize a different scan order to determine neighborhood contexts for the transform coefficients. Video encoder 32 and video decoder 42 may then code significance flags, level information, and sign information based on the determined neighborhood contexts. For example, video encoder 32 and video decoder 42 may utilize a horizontal or vertical scan (referred to as the nominal scan) to identify the last significant transform coefficient, and then utilize a diagonal scan on the 4×4 blocks or 4×4 sub-blocks (if 8×8 block) to determine the neighborhood contexts.
  • In some examples, for 16×16 and 32×32 blocks, a neighborhood (in the transform domain) of the current coefficient being processed is used for derivation of the context used to code the significance flag for the coefficient. Similarly, in JCTVC-H0228, a neighborhood is used for coding significance as well as level information for all block sizes. Using neighborhood-based contexts for 4×4 and 8×8 blocks may improve the coding efficiency of HEVC. But if the existing significance neighborhoods for significance maps from some other techniques are used with horizontal or vertical scans, the ability to derive contexts in parallel may be affected. Hence, in some examples, a scheme is described which uses certain aspects of horizontal and vertical scans with the neighborhood used for significance coding from some other techniques.
  • This is accomplished as follows. In some examples, first the position of the last significant coefficient in the scan order is coded in the bit-stream. This is followed by the significance map for a subset of 16 coefficients (a 4×4 sub-block in case of a 4×4 sub-block based diagonal scan) in backwards scan order, followed by coding passes for level information and sign. It should be noted that the position of the last significant coefficient depends directly on the specific scan that is used. An example of this is shown in FIG. 10.
  • FIG. 10 is a conceptual diagram illustrating positions of a last significant coefficient depending on the scan order. FIG. 10 illustrates block 124. The pixels shown with solid circles are significant. For a horizontal scan, the position of the last significant position is (1, 2) in (row, column) format (transform coefficient 128). For a 4×4 subblock based diagonal scan (up-right), the position of the last significant position is (0, 3) (transform coefficient 126).
  • In this example, for horizontal or vertical scans, the last significant coefficient position is still determined and coded based on the nominal scan. But then, for coding significance, level and sign information, the block is scanned using a 4×4 sub-block based diagonal scan starting with the bottom-right coefficient and proceeding backwards to the DC coefficient. If it can be derived from the position of the last significant coefficient that a particular coefficient is not significant, no significance, level or sign information is coded for that coefficient.
  • Example of this approach is shown in FIG. 11 for a horizontal scan. FIG. 11 is a conceptual diagram illustrating use of a diagonal scan in place of an original horizontal scan. FIG. 11 illustrates block 130. The coefficients with solid fill are significant. The position of the last significant position, assuming a horizontal scan, is (1, 1) (transform coefficient 132). All coefficients with row indices greater than 1 can be inferred to be not significant. Similarly, all coefficients with row index 1 and column index greater than 1 can be inferred to be not significant. Similarly, the coefficient (1, 1) can be inferred to be significant. Its level and sign information cannot be inferred. For coding of significance, level and sign information, a backward 4×4 sub-block based diagonal scan is used. Starting with the bottom right coefficient, the significance flags are encoded. The significance flags that can be inferred are not explicitly coded. A neighborhood based context is used for coding of significance flags. The neighborhood may be the same as that used for 16×16 and 32×32 blocks or a different neighborhood may be used. It should be noted that, similar to above, separate sets of neighborhood-based contexts may be used for the different scans (horizontal, vertical, and 4×4 sub-block). Also, the contexts may be shared between different block sizes.
  • In another example, any of a various techniques, such as those of JCTVC-H0228, may be used for coding significance, level and sign information for 4×4 and 8×8 blocks after the position of the last significant position is coded assuming the nominal scan. For coding of significance, level and sign information, a 4×4 sub-block based diagonal scan may be used.
  • It should be noted that the method is not restricted to horizontal, vertical and 4×4 sub-block based diagonal scans. The basic principle is to send the last significant coefficient position assuming the nominal scan and then code the significance (and possibly level and sign) information using another scan which uses neighborhood based contexts. Similarly, although the techniques have been described for 4×4 and 8×8 blocks, it can be extended to any block size where horizontal and/or vertical scans may be used.
  • In one example, rather than utilizing separate contexts for each transform coefficient based on its position in the transform block, the video coder (e.g., video encoder 32 or video decoder 42) may determine which context to use for coding a transform coefficient based on row index or the column index of the transform coefficient. For example, for a horizontal scan, all transform coefficients in the same row may share the same context, and the video coder may utilize different contexts for transform coefficients in the different rows. For a vertical scan, all transform coefficients in the same column may share the same context, and the video coder may utilize different contexts for transform coefficients in the different columns.
  • Some other techniques may use multiple context sets based on coefficient position for coding of significance maps for block sizes of 16×16 and higher. Similarly, JCTVC-H0228(and also HM5.0) uses the sum of row and column indices to determine the context set. In the case of JCTVC-H0228, this is done even for horizontal and vertical scans.
  • In some example techniques of this disclosure, the context set used to code the significance or level for a particular coefficient for horizontal scan may depend only on the row index of the coefficient. Similarly, the context set to code the significance or level for a coefficient in case of vertical scan may depend only on the column index of the coefficient.
  • In some example techniques of this disclosure, the context set may depend only on the absolute index of the coefficient in the scan. Different scans may use different functions to derive the context set.
  • Furthermore, as described above, horizontal, vertical and 4×4 sub-block-based diagonal scans may use separate context sets or the horizontal and vertical scans may share context sets. In some examples, not only the context set but also the context itself depends only on the absolute index of the coefficient in the scanning order.
  • In some examples, the video coder (e.g., video encoder 32 or video decoder 42) may be configured to implement only one type of scan (e.g., a diagonal scan). However, the neighboring regions that the video coder evaluates may be based on the nominal scan. The nominal scan is the scan the video coder would have performed had the video coder been able to perform other scans. For instance, video encoder 32 may signal that the horizontal scan is to be used. However, video decoder 42 may implement the diagonal scan instead, but the neighboring regions that the video coder evaluates may be based on the signaling that the horizontal scan is to be used. The same would apply for the vertical scan.
  • In some examples, if the nominal scan is the horizontal scan, then the video coder may stretch the neighboring region that is evaluated in the horizontal direction relative to the regions that are currently used. The same would apply when the nominal scan is the vertical scan, but in the vertical direction. The stretching of the neighboring region may be referred to as varying the region. For example, if the nominal scan is horizontal, then rather than evaluating a transform coefficient that is two rows down from where the current transform coefficient being coded is located, the video coder may evaluate the transform coefficient that is three columns apart from where the current transform coefficient is located. The same would apply when the nominal scan is the vertical scan, but the transform coefficient would be located three rows apart from where the current transform coefficient (e.g., the one being coded) is located
  • FIG. 12 is a conceptual diagram illustrating a context neighborhood for a nominal horizontal scan. FIG. 12 illustrates 8×8 block 134 that includes 4×4 sub-blocks 136A-136D. Compared to the context neighborhood in some other techniques, the coefficient two rows down has been replaced by the coefficient that is in the same row but three columns apart (X4). Similarly, if the nominal scan is vertical, a context neighborhood that is stretched in the vertical direction may be used.
  • In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over, as one or more instructions or code, a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, computer-readable media generally may correspond to (1) tangible computer-readable storage media which is non-transitory or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. A computer program product may include a computer-readable medium.
  • By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are instead directed to non-transient, tangible storage media. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc, where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
  • Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term “processor,” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.
  • The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, various units may be combined in a codec hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.
  • Various examples have been described. These and other examples are within the scope of the following claims.

Claims (36)

What is claimed is:
1. A method for decoding video data, the method comprising:
receiving, from a coded bitstream, significance flags of transform coefficients of a block;
determining a scan order for the transform coefficients of the block;
determining contexts for the significance flags of the transform coefficients of the block based on the determined scan order; and
context adaptive binary arithmetic coding (CABAC) decoding the significance flags of the transform coefficients based at least on the determined contexts.
2. The method of claim 1, wherein determining the contexts comprises determining the contexts based on size of the block, positions of the transform coefficients within the block, and the scan order.
3. The method of claim 1, wherein determining the contexts comprises:
determining the contexts that are the same if the determined scan order is a horizontal scan and if the determined scan order is a vertical scan; and
determining the contexts, which are different than the contexts if the determined scan order is the horizontal scan and if the determined scan order is the vertical scan, if the determined scan order is not the horizontal scan or the vertical scan.
4. The method of claim 1, wherein determining contexts for the significance flags of the transform coefficients of the block based on the determined scan order comprises determining the same contexts if the scan order is horizontal scan order or vertical scan order.
5. The method of claim 1, wherein determining the contexts comprises:
determining a first set of contexts for the significance flags if the scan order is a first scan order; and
determining a second set of contexts for the significance flags if the scan order is a second scan order.
6. The method of claim 5, wherein the first set of contexts is the same as the second set of contexts if the first scan order is a horizontal scan and the second scan order is a vertical scan.
7. The method of claim 5, wherein the first set of context is different than the second set of contexts if the first scan order is one of a horizontal scan or a vertical scan and the second scan order is not the horizontal scan or the vertical scan.
8. The method of claim 1, wherein determining the contexts comprises determining the contexts for the significance flags of the transform coefficients of the block based on the determined scan order and based on size of the block.
9. The method of claim 1, further comprising:
determining whether size of the block is a first size or a second size,
wherein, if the size of the block is the first size, determining the contexts comprises determining the contexts that are the same for all scan orders, and
wherein, if the size of the block is the second size, determining the contexts comprises determining the contexts that are different for at least two different scan orders.
10. The method of claim 1, wherein the block comprises an 8×8 block of transform coefficients.
11. A method for encoding video data, the method comprising:
determining a scan order for transform coefficients of a block;
determining contexts for significance flags of the transform coefficients of the block based on the determined scan order;
context adaptive binary arithmetic coding (CABAC) encoding the significance flags of the transform coefficients based at least on the determined contexts; and
signaling the encoded significance flags in a coded bitstream.
12. The method of claim 11, wherein determining the contexts comprises determining the contexts based on size of the block, positions of the transform coefficients within the block, and the scan order.
13. The method of claim 11, wherein determining the contexts comprises:
determining the contexts that are the same if the determined scan order is a horizontal scan and if the determined scan order is a vertical scan; and
determining the contexts, which are different than the contexts if the determined scan order is the horizontal scan and if the determined scan order is the vertical scan, if the determined scan order is not the horizontal scan or the vertical scan.
14. The method of claim 11, wherein determining contexts for the significance flags of the transform coefficients of the block based on the determined scan order comprises determining the same contexts if the scan order is horizontal scan order or vertical scan order.
15. The method of claim 11, wherein determining the contexts comprises:
determining a first set of contexts for the significance flags if the scan order is a first scan order; and
determining a second set of contexts for the significance flags if the scan order is a second scan order.
16. The method of claim 15, wherein the first set of contexts is the same as the second set of contexts if the first scan order is a horizontal scan and the second scan order is a vertical scan.
17. The method of claim 15, wherein the first set of context is different than the second set of contexts if the first scan order is one of a horizontal scan or a vertical scan and the second scan order is not the horizontal scan or the vertical scan.
18. The method of claim 11, wherein determining the contexts comprises determining the contexts for the significance flags of the transform coefficients of the block based on the determined scan order and based on size of the block.
19. The method of claim 11, wherein the block comprises an 8×8 block of transform coefficients.
20. An apparatus for coding video data, the apparatus comprising a video coder configured to:
determine a scan order for transform coefficients of a block;
determine contexts for significance flags of the transform coefficients of the block based on the determined scan order; and
context adaptive binary arithmetic coding (CABAC) code the significance flags of the transform coefficients based at least on the determined contexts.
21. The apparatus of claim 20, wherein the video coder comprises a video decoder, and wherein the video decoder is configured to:
receive, from a coded bitstream, the significance flags of the transform coefficients of the block; and
CABAC decode the significance flags of the transform coefficients based on the determined contexts.
22. The apparatus of claim 20, wherein the video coder comprises a video encoder, and wherein the video encoder is configured to:
CABAC encode the significance flags of the transform coefficients based on the determined contexts; and
signal, in a coded bitstream, the significance flags of the transform coefficients.
23. The apparatus of claim 20, wherein, to determine the contexts, the video coder is configured to determine the contexts based on size of the block, positions of the transform coefficients within the block, and the scan order.
24. The apparatus of claim 20, wherein, to determine the contexts, the video coder is configured to:
determine the contexts that are the same if the determined scan order is a horizontal scan and if the determined scan order is a vertical scan; and
determine the contexts, which are different than the contexts if the determined scan order is the horizontal scan and if the determined scan order is the vertical scan, if the determined scan order is not the horizontal scan or the vertical scan.
25. The apparatus of claim 20, wherein, to determine contexts for the significance flags of the transform coefficients of the block based on the determined scan order, the video coder is configured to determine the same contexts if the scan order is horizontal scan order or vertical scan order.
26. The apparatus of claim 20, wherein, to determine the contexts, the video coder is configured to:
determine a first set of contexts for the significance flags if the scan order is a first scan order; and
determine a second set of contexts for the significance flags if the scan order is a second scan order.
27. The apparatus of claim 26, wherein the first set of contexts is the same as the second set of contexts if the first scan order is a horizontal scan and the second scan order is a vertical scan.
28. The apparatus of claim 26, wherein the first set of context is different than the second set of contexts if the first scan order is one of a horizontal scan or a vertical scan and the second scan order is not the horizontal scan or the vertical scan.
29. The apparatus of claim 20, wherein, to determine the contexts, the video coder is configured to determine the contexts for the significance flags of the transform coefficients of the block based on the determined scan order and based on size of the block.
30. The apparatus of claim 20, wherein the video coder is configured to:
determine whether size of the block is a first size or a second size,
wherein, if the size of the block is the first size, the video coder is configured to determine the contexts that are the same for all scan orders, and
wherein, if the size of the block is the second size, the video coder is configured to determine the contexts that are different for at least two different scan orders.
31. The apparatus of claim 20, wherein the block comprises an 8×8 block of transform coefficients.
32. The apparatus of claim 20, wherein the apparatus comprises one of:
a microprocessor;
an integrated circuit (IC); and
a wireless communication device that includes the video coder.
33. An apparatus for coding video data, the apparatus comprising:
means for determining a scan order for transform coefficients of a block;
means for determining contexts for significance flags of the transform coefficients of the block based on the determined scan order; and
means for context adaptive binary arithmetic coding (CABAC) the significance flags of the transform coefficients based at least on the determined contexts.
34. The apparatus of claim 33, wherein the means for determining the contexts comprises means for determining the contexts based on size of the block, positions of the transform coefficients within the block, and the scan order.
35. A computer-readable storage medium having instructions stored thereon that when executed cause one or more processors of an apparatus for coding video data to:
determine a scan order for transform coefficients of a block;
determine contexts for significance flags of the transform coefficients of the block based on the determined scan order; and
context adaptive binary arithmetic coding (CABAC) code the significance flags of the transform coefficients based at least on the determined contexts.
36. The computer-readable storage medium of claim 35, wherein the instructions that cause the one or more processors to determine the contexts comprise instructions that cause the one or more processors to determine the contexts based on size of the block, positions of the transform coefficients within the block, and the scan order.
US13/862,818 2012-04-16 2013-04-15 Transform coefficient coding Abandoned US20130272423A1 (en)

Priority Applications (15)

Application Number Priority Date Filing Date Title
US13/862,818 US20130272423A1 (en) 2012-04-16 2013-04-15 Transform coefficient coding
PCT/US2013/036779 WO2013158642A1 (en) 2012-04-16 2013-04-16 Transform coefficient coding
JP2015505990A JP2015516768A (en) 2012-04-16 2013-04-16 Transform coefficient coding
CN201380019906.1A CN104247420A (en) 2012-04-16 2013-04-16 Transform coefficient coding
AU2013249427A AU2013249427A1 (en) 2012-04-16 2013-04-16 Transform coefficient coding
TW102113542A TW201352004A (en) 2012-04-16 2013-04-16 Transform coefficient coding
RU2014145851A RU2014145851A (en) 2012-04-16 2013-04-16 TRANSFORMATION CODING CODING
KR20147031985A KR20150003327A (en) 2012-04-16 2013-04-16 Transform coefficient coding
EP13718986.6A EP2839646A1 (en) 2012-04-16 2013-04-16 Transform coefficient coding
SG11201405856XA SG11201405856XA (en) 2012-04-16 2013-04-16 Transform coefficient coding
CA2869305A CA2869305A1 (en) 2012-04-16 2013-04-16 Transform coefficient coding
IL234708A IL234708A0 (en) 2012-04-16 2014-09-17 Tansform coefficient coding
PH12014502144A PH12014502144A1 (en) 2012-04-16 2014-09-25 Transform coefficient coding
ZA2014/07860A ZA201407860B (en) 2012-04-16 2014-10-28 Transform coefficient coding
HK15101986.7A HK1201661A1 (en) 2012-04-16 2015-02-27 Transform coefficient coding

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201261625039P 2012-04-16 2012-04-16
US201261667382P 2012-07-02 2012-07-02
US13/862,818 US20130272423A1 (en) 2012-04-16 2013-04-15 Transform coefficient coding

Publications (1)

Publication Number Publication Date
US20130272423A1 true US20130272423A1 (en) 2013-10-17

Family

ID=49325050

Family Applications (3)

Application Number Title Priority Date Filing Date
US13/832,909 Active 2034-04-16 US9124872B2 (en) 2012-04-16 2013-03-15 Coefficient groups and coefficient coding for coefficient scans
US13/834,006 Active 2034-08-29 US9621921B2 (en) 2012-04-16 2013-03-15 Coefficient groups and coefficient coding for coefficient scans
US13/862,818 Abandoned US20130272423A1 (en) 2012-04-16 2013-04-15 Transform coefficient coding

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US13/832,909 Active 2034-04-16 US9124872B2 (en) 2012-04-16 2013-03-15 Coefficient groups and coefficient coding for coefficient scans
US13/834,006 Active 2034-08-29 US9621921B2 (en) 2012-04-16 2013-03-15 Coefficient groups and coefficient coding for coefficient scans

Country Status (19)

Country Link
US (3) US9124872B2 (en)
EP (3) EP2839645B1 (en)
JP (4) JP6525865B2 (en)
KR (3) KR102115049B1 (en)
CN (3) CN104247421B (en)
AR (1) AR091338A1 (en)
AU (2) AU2013249532A1 (en)
CA (2) CA2868533A1 (en)
DK (1) DK2839645T3 (en)
ES (1) ES2637490T3 (en)
HK (2) HK1201103A1 (en)
IL (2) IL234705A0 (en)
PH (2) PH12014502144A1 (en)
RU (2) RU2014145852A (en)
SG (2) SG11201405867WA (en)
SI (1) SI2839645T1 (en)
TW (2) TW201349867A (en)
WO (3) WO2013158563A1 (en)
ZA (2) ZA201407860B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9124872B2 (en) 2012-04-16 2015-09-01 Qualcomm Incorporated Coefficient groups and coefficient coding for coefficient scans
US20160021396A1 (en) * 2013-03-08 2016-01-21 Board Of Regents, The University Of Texas System Systems and methods for digital media compression and recompression
US10123044B2 (en) * 2015-07-16 2018-11-06 Mediatek Inc. Partial decoding circuit of video encoder/decoder for dealing with inverse second transform and partial encoding circuit of video encoder for dealing with second transform
US10630974B2 (en) * 2017-05-30 2020-04-21 Google Llc Coding of intra-prediction modes
WO2020143742A1 (en) * 2019-01-10 2020-07-16 Beijing Bytedance Network Technology Co., Ltd. Simplified context modeling for context adaptive binary arithmetic coding
US10798390B2 (en) 2016-02-12 2020-10-06 Huawei Technologies Co., Ltd. Method and apparatus for scan order selection
US11032572B2 (en) * 2019-05-17 2021-06-08 Qualcomm Incorporated Low-frequency non-separable transform signaling based on zero-out patterns for video coding
US11070820B2 (en) 2018-11-06 2021-07-20 Beijing Bytedance Network Technology Co., Ltd. Condition dependent inter prediction with geometric partitioning
US20210321107A1 (en) * 2020-04-13 2021-10-14 Qualcomm Incorporated Coefficient coding for support of different color formats in video coding
US11582455B2 (en) 2016-02-12 2023-02-14 Huawei Technologies Co., Ltd. Method and apparatus for scan order selection
US20230097724A1 (en) * 2021-02-21 2023-03-30 Tencent Technology (Shenzhen) Company Limited Video encoding method and apparatus, video decoding method and apparatus, computer-readable medium, and electronic device
US11695962B2 (en) 2017-11-23 2023-07-04 Interdigital Vc Holdings, Inc. Encoding and decoding methods and corresponding devices
US11695960B2 (en) 2019-06-14 2023-07-04 Qualcomm Incorporated Transform and last significant coefficient position signaling for low-frequency non-separable transform in video coding
US11700389B2 (en) 2016-08-31 2023-07-11 Kt Corporation Method and apparatus for processing video signal
US11956431B2 (en) 2018-12-30 2024-04-09 Beijing Bytedance Network Technology Co., Ltd Conditional application of inter prediction with geometric partitioning in video processing

Families Citing this family (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7582415B2 (en) 2001-09-06 2009-09-01 Don Straus Rapid detection of replicating cells
CN105357540B (en) * 2011-06-28 2019-09-06 三星电子株式会社 The method that video is decoded
US8891630B2 (en) * 2011-10-24 2014-11-18 Blackberry Limited Significance map encoding and decoding using partition set based context assignment
JP2014533058A (en) * 2011-11-08 2014-12-08 サムスン エレクトロニクス カンパニー リミテッド Video arithmetic encoding method and apparatus, and video arithmetic decoding method and apparatus
AU2012200319B2 (en) * 2012-01-19 2015-11-26 Canon Kabushiki Kaisha Method, apparatus and system for encoding and decoding the significance map for residual coefficients of a transform unit
CN104350753B (en) 2012-06-01 2019-07-09 威勒斯媒体国际有限公司 Arithmetic decoding device, picture decoding apparatus, arithmetic coding device and picture coding device
US9813737B2 (en) * 2013-09-19 2017-11-07 Blackberry Limited Transposing a block of transform coefficients, based upon an intra-prediction mode
KR102333000B1 (en) * 2015-01-15 2021-12-01 한국전자통신연구원 Method for fast transform coefficient coding and apparatus for the same
US10574993B2 (en) * 2015-05-29 2020-02-25 Qualcomm Incorporated Coding data using an enhanced context-adaptive binary arithmetic coding (CABAC) design
CA2988451C (en) 2015-06-23 2021-01-19 Mediatek Singapore Pte. Ltd. Method and apparatus for transform coefficient coding of non-square blocks
US10784901B2 (en) 2015-11-12 2020-09-22 Qualcomm Incorporated Puncturing for structured low density parity check (LDPC) codes
US11043966B2 (en) 2016-05-11 2021-06-22 Qualcomm Incorporated Methods and apparatus for efficiently generating multiple lifted low-density parity-check (LDPC) codes
US10454499B2 (en) 2016-05-12 2019-10-22 Qualcomm Incorporated Enhanced puncturing and low-density parity-check (LDPC) code structure
US10291354B2 (en) 2016-06-14 2019-05-14 Qualcomm Incorporated High performance, flexible, and compact low-density parity-check (LDPC) code
EP3264763A1 (en) * 2016-06-29 2018-01-03 Thomson Licensing Method and apparatus for improved significance flag coding using simple local predictor
US10972733B2 (en) 2016-07-15 2021-04-06 Qualcomm Incorporated Look-up table for enhanced multiple transform
BR112019021584B1 (en) * 2017-04-13 2022-06-28 Lg Electronics Inc. IMAGE ENCODING/DECODING METHOD AND DEVICE FOR THE SAME
CN108881909A (en) * 2017-05-09 2018-11-23 富士通株式会社 Scanning sequency generation method and scanning sequency generating device
CN107071494B (en) * 2017-05-09 2019-10-11 珠海市杰理科技股份有限公司 The generation method and system of the binary syntax element of video image frame
US10312939B2 (en) 2017-06-10 2019-06-04 Qualcomm Incorporated Communication techniques involving pairwise orthogonality of adjacent rows in LPDC code
JP7198268B2 (en) * 2017-07-31 2022-12-28 エレクトロニクス アンド テレコミュニケーションズ リサーチ インスチチュート Image decoding method, image encoding method and computer readable recording medium
US10523968B2 (en) 2017-09-18 2019-12-31 Google Llc Coding of last significant coefficient flags
KR102628530B1 (en) * 2017-10-20 2024-01-24 에스케이텔레콤 주식회사 Apparatus and Method for Video Encoding or Decoding
WO2019078693A1 (en) * 2017-10-20 2019-04-25 에스케이텔레콤 주식회사 Apparatus and method for image encoding or decoding
WO2019117402A1 (en) * 2017-12-13 2019-06-20 삼성전자 주식회사 Video decoding method and device thereof, and video encoding method and device thereof
WO2019135448A1 (en) * 2018-01-02 2019-07-11 삼성전자 주식회사 Method for decoding video and apparatus therefor and method for encoding video and apparatus therefor
WO2019199838A1 (en) * 2018-04-12 2019-10-17 Futurewei Technologies, Inc. Reducing context switching for coding transform coefficients
EP3562156A1 (en) * 2018-04-27 2019-10-30 InterDigital VC Holdings, Inc. Method and apparatus for adaptive context modeling in video encoding and decoding
JP7520809B2 (en) * 2018-09-21 2024-07-23 インターデジタル ヴイシー ホールディングス, インコーポレイテッド A scalar quantizer decision scheme for scalar quantization dependencies.
EP3857882A1 (en) * 2018-09-24 2021-08-04 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Efficient coding of transform coefficients using or suitable for a combination with dependent scalar quantization
CN113170137B (en) * 2018-11-21 2024-09-27 交互数字Vc控股公司 Residual coding to reduce use of local neighborhood
US11102513B2 (en) 2018-12-06 2021-08-24 Tencent America LLC One-level transform split and adaptive sub-block transform
JP7257523B2 (en) 2018-12-28 2023-04-13 テレフオンアクチーボラゲット エルエム エリクソン(パブル) Method and Apparatus for Selecting Transform Choices in Encoders and Decoders
US11202100B2 (en) 2019-03-11 2021-12-14 Qualcomm Incorporated Coefficient coding for transform skip mode
CN114501009B (en) * 2019-03-21 2023-12-19 三星电子株式会社 Video decoding device and video encoding device
JP7448559B2 (en) * 2019-04-19 2024-03-12 バイトダンス インコーポレイテッド Context encoding for transform skip mode
CA3137163C (en) 2019-04-24 2024-05-14 Bytedance Inc. Constraints on quantized residual differential pulse code modulation representation of coded video
CN113796069B (en) 2019-05-01 2024-03-08 字节跳动有限公司 Intra-frame codec video using quantized residual differential pulse codec modulation codec
EP3949387A4 (en) 2019-05-02 2022-05-18 ByteDance Inc. Signaling in transform skip mode
CN113785306B (en) 2019-05-02 2024-06-14 字节跳动有限公司 Coding and decoding mode based on coding and decoding tree structure type
CN114467310B (en) * 2019-08-31 2024-10-29 Lg电子株式会社 Image decoding method and device for residual data compiling in image compiling system
WO2021096174A1 (en) * 2019-11-11 2021-05-20 엘지전자 주식회사 Transformation-based image coding method and device therefor
CN113038140B (en) * 2019-12-24 2024-05-28 扬智电子科技(成都)有限公司 Video decoding method and device for context adaptive binary arithmetic coding

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120082235A1 (en) * 2010-10-05 2012-04-05 General Instrument Corporation Coding and decoding utilizing context model selection with adaptive scan pattern
US20120229478A1 (en) * 2011-03-08 2012-09-13 Texas Instruments Incorporated Reduced context dependency at transform edges for parallel context processing
US20130003857A1 (en) * 2011-06-29 2013-01-03 General Instrument Corporation Methods and system for using a scan coding pattern during inter coding
US20130235925A1 (en) * 2012-03-08 2013-09-12 Research In Motion Limited Unified transform coefficient encoding and decoding

Family Cites Families (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6195389B1 (en) 1998-04-16 2001-02-27 Scientific-Atlanta, Inc. Motion estimation system and methods
US7724827B2 (en) 2003-09-07 2010-05-25 Microsoft Corporation Multi-layer run level encoding and decoding
US7599435B2 (en) * 2004-01-30 2009-10-06 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Video frame encoding and decoding
CN1589023A (en) * 2004-08-06 2005-03-02 联合信源数字音视频技术(北京)有限公司 Coding and decoding method and device for multiple coded list lengthening based on context
US20090123066A1 (en) * 2005-07-22 2009-05-14 Mitsubishi Electric Corporation Image encoding device, image decoding device, image encoding method, image decoding method, image encoding program, image decoding program, computer readable recording medium having image encoding program recorded therein,
US20080123947A1 (en) 2005-07-22 2008-05-29 Mitsubishi Electric Corporation Image encoding device, image decoding device, image encoding method, image decoding method, image encoding program, image decoding program, computer readable recording medium having image encoding program recorded therein
EP1982428A2 (en) 2005-08-31 2008-10-22 Micronas USA, Inc. Macroblock neighborhood address calculation
US8306112B2 (en) 2005-09-20 2012-11-06 Mitsubishi Electric Corporation Image encoding method and image decoding method, image encoder and image decoder, and image encoded bit stream and recording medium
CA2710354C (en) 2005-09-20 2014-09-23 Mitsubishi Electric Corporation Image encoding method and image decoding method, image encoder and image decoder, and image encoded bit stream and recording medium
FR2895602B1 (en) 2005-12-22 2008-03-07 Assistance Tech Et Etude De Ma DEVICE AND METHOD FOR CABAC TYPE ENCODING
US8848789B2 (en) * 2006-03-27 2014-09-30 Qualcomm Incorporated Method and system for coding and decoding information associated with video compression
US7554468B2 (en) * 2006-08-25 2009-06-30 Sony Computer Entertainment Inc, Entropy decoding methods and apparatus using most probable and least probable signal cases
US7460725B2 (en) * 2006-11-09 2008-12-02 Calista Technologies, Inc. System and method for effectively encoding and decoding electronic information
KR101356733B1 (en) * 2007-03-07 2014-02-05 삼성전자주식회사 Method and apparatus for Context Adaptive Binary Arithmetic Coding and decoding
CN101415121B (en) 2007-10-15 2010-09-29 华为技术有限公司 Self-adapting method and apparatus for forecasting frame
JP4875024B2 (en) 2008-05-09 2012-02-15 株式会社東芝 Image information transmission device
KR20090129926A (en) * 2008-06-13 2009-12-17 삼성전자주식회사 Method and apparatus for image encoding by dynamic unit grouping, and method and apparatus for image decoding by dynamic unit grouping
BRPI0918019B1 (en) 2008-08-19 2021-05-18 Contentarmor WATERMARK COMPATIBLE WITH CABAC/CVA OF SYNTAX ELEMENTS IN COMPRESSED VIDEO
JP5492206B2 (en) 2009-07-27 2014-05-14 株式会社東芝 Image encoding method and image decoding method, and image encoding device and image decoding device
US20120044987A1 (en) 2009-12-31 2012-02-23 Broadcom Corporation Entropy coder supporting selective employment of syntax and context adaptation
CN103119849B (en) 2010-04-13 2017-06-16 弗劳恩霍夫应用研究促进协会 Probability interval partition encoding device and decoder
CN108471537B (en) 2010-04-13 2022-05-17 Ge视频压缩有限责任公司 Device and method for decoding transformation coefficient block and device for coding transformation coefficient block
KR102310816B1 (en) 2010-05-12 2021-10-13 인터디지털 매디슨 페턴트 홀딩스 에스에이에스 Methods and apparatus for unified significance map coding
US9172968B2 (en) * 2010-07-09 2015-10-27 Qualcomm Incorporated Video coding using directional transforms
US9154801B2 (en) 2010-09-30 2015-10-06 Texas Instruments Incorporated Method and apparatus for diagonal scan and simplified coding of transform coefficients
US9042440B2 (en) 2010-12-03 2015-05-26 Qualcomm Incorporated Coding the position of a last significant coefficient within a video block based on a scanning order for the block in video coding
US20120163456A1 (en) 2010-12-22 2012-06-28 Qualcomm Incorporated Using a most probable scanning order to efficiently code scanning order information for a video block in video coding
WO2012093969A1 (en) * 2011-01-07 2012-07-12 Agency For Science, Technology And Research Method and an apparatus for coding an image
WO2012098868A1 (en) 2011-01-19 2012-07-26 パナソニック株式会社 Image-encoding method, image-decoding method, image-encoding device, image-decoding device, and image-encoding/decoding device
US20120207400A1 (en) 2011-02-10 2012-08-16 Hisao Sasai Image coding method, image coding apparatus, image decoding method, image decoding apparatus, and image coding and decoding apparatus
US8953690B2 (en) 2011-02-16 2015-02-10 Google Technology Holdings LLC Method and system for processing video data
IL290229B2 (en) 2011-06-16 2023-04-01 Ge Video Compression Llc Entropy coding of motion vector differences
US9756360B2 (en) 2011-07-19 2017-09-05 Qualcomm Incorporated Coefficient scanning in video coding
PT3166317T (en) 2011-10-31 2018-10-08 Samsung Electronics Co Ltd Method and apparatus for determining a context model for transform coefficient level entropy encoding and decoding
BR112013018850B1 (en) * 2011-12-21 2022-09-27 Sun Patent Trust IMAGE DECODING METHOD AND DEVICE, AND IMAGE ENCODING METHOD AND DEVICE
EP2803190B1 (en) 2012-01-09 2017-10-25 Dolby Laboratories Licensing Corporation Hybrid reference picture reconstruction method for multiple layered video coding systems
AU2012365727B2 (en) 2012-01-13 2015-11-05 Hfi Innovation Inc. Method and apparatus for unification of coefficient scan of 8x8 transform units in HEVC
US20130188736A1 (en) 2012-01-19 2013-07-25 Sharp Laboratories Of America, Inc. High throughput significance map processing for cabac in hevc
US8581753B2 (en) 2012-01-19 2013-11-12 Sharp Laboratories Of America, Inc. Lossless coding technique for CABAC in HEVC
US8552890B2 (en) 2012-01-19 2013-10-08 Sharp Laboratories Of America, Inc. Lossless coding with different parameter selection technique for CABAC in HEVC
US9813701B2 (en) * 2012-01-20 2017-11-07 Google Technology Holdings LLC Devices and methods for context reduction in last significant coefficient position coding
US9124872B2 (en) 2012-04-16 2015-09-01 Qualcomm Incorporated Coefficient groups and coefficient coding for coefficient scans
EP2866443A4 (en) * 2012-06-22 2016-06-15 Sharp Kk Arithmetic decoding device, arithmetic coding device, image decoding device and image coding device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120082235A1 (en) * 2010-10-05 2012-04-05 General Instrument Corporation Coding and decoding utilizing context model selection with adaptive scan pattern
US20120229478A1 (en) * 2011-03-08 2012-09-13 Texas Instruments Incorporated Reduced context dependency at transform edges for parallel context processing
US20130003857A1 (en) * 2011-06-29 2013-01-03 General Instrument Corporation Methods and system for using a scan coding pattern during inter coding
US20130235925A1 (en) * 2012-03-08 2013-09-12 Research In Motion Limited Unified transform coefficient encoding and decoding

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9621921B2 (en) 2012-04-16 2017-04-11 Qualcomm Incorporated Coefficient groups and coefficient coding for coefficient scans
US9124872B2 (en) 2012-04-16 2015-09-01 Qualcomm Incorporated Coefficient groups and coefficient coding for coefficient scans
US20160021396A1 (en) * 2013-03-08 2016-01-21 Board Of Regents, The University Of Texas System Systems and methods for digital media compression and recompression
US10382789B2 (en) * 2013-03-08 2019-08-13 Board Of Regents Of The University Of Texas System Systems and methods for digital media compression and recompression
US10123044B2 (en) * 2015-07-16 2018-11-06 Mediatek Inc. Partial decoding circuit of video encoder/decoder for dealing with inverse second transform and partial encoding circuit of video encoder for dealing with second transform
US10798390B2 (en) 2016-02-12 2020-10-06 Huawei Technologies Co., Ltd. Method and apparatus for scan order selection
US11582455B2 (en) 2016-02-12 2023-02-14 Huawei Technologies Co., Ltd. Method and apparatus for scan order selection
US12096018B2 (en) 2016-08-31 2024-09-17 Kt Corporation Method and apparatus for processing video signal
US11700389B2 (en) 2016-08-31 2023-07-11 Kt Corporation Method and apparatus for processing video signal
US10630974B2 (en) * 2017-05-30 2020-04-21 Google Llc Coding of intra-prediction modes
US11695962B2 (en) 2017-11-23 2023-07-04 Interdigital Vc Holdings, Inc. Encoding and decoding methods and corresponding devices
US11070820B2 (en) 2018-11-06 2021-07-20 Beijing Bytedance Network Technology Co., Ltd. Condition dependent inter prediction with geometric partitioning
US11070821B2 (en) 2018-11-06 2021-07-20 Beijing Bytedance Network Technology Co., Ltd. Side information signaling for inter prediction with geometric partitioning
US11159808B2 (en) 2018-11-06 2021-10-26 Beijing Bytedance Network Technology Co., Ltd. Using inter prediction with geometric partitioning for video processing
US11166031B2 (en) 2018-11-06 2021-11-02 Beijing Bytedance Network Technology Co., Ltd. Signaling of side information for inter prediction with geometric partitioning
US11457226B2 (en) 2018-11-06 2022-09-27 Beijing Bytedance Network Technology Co., Ltd. Side information signaling for inter prediction with geometric partitioning
US11570450B2 (en) 2018-11-06 2023-01-31 Beijing Bytedance Network Technology Co., Ltd. Using inter prediction with geometric partitioning for video processing
US11611763B2 (en) 2018-11-06 2023-03-21 Beijing Bytedance Network Technology Co., Ltd. Extensions of inter prediction with geometric partitioning
US11956431B2 (en) 2018-12-30 2024-04-09 Beijing Bytedance Network Technology Co., Ltd Conditional application of inter prediction with geometric partitioning in video processing
WO2020143742A1 (en) * 2019-01-10 2020-07-16 Beijing Bytedance Network Technology Co., Ltd. Simplified context modeling for context adaptive binary arithmetic coding
CN113170139A (en) * 2019-01-10 2021-07-23 北京字节跳动网络技术有限公司 Simplified context modeling for context adaptive binary arithmetic coding
US11032572B2 (en) * 2019-05-17 2021-06-08 Qualcomm Incorporated Low-frequency non-separable transform signaling based on zero-out patterns for video coding
US11695960B2 (en) 2019-06-14 2023-07-04 Qualcomm Incorporated Transform and last significant coefficient position signaling for low-frequency non-separable transform in video coding
US20210321107A1 (en) * 2020-04-13 2021-10-14 Qualcomm Incorporated Coefficient coding for support of different color formats in video coding
US11785219B2 (en) * 2020-04-13 2023-10-10 Qualcomm Incorporated Coefficient coding for support of different color formats in video coding
US20230097724A1 (en) * 2021-02-21 2023-03-30 Tencent Technology (Shenzhen) Company Limited Video encoding method and apparatus, video decoding method and apparatus, computer-readable medium, and electronic device

Also Published As

Publication number Publication date
IL234708A0 (en) 2014-11-30
CA2869305A1 (en) 2013-10-24
KR20150003320A (en) 2015-01-08
SI2839645T1 (en) 2017-11-30
CN104247421B (en) 2018-01-19
PH12014502156A1 (en) 2014-12-10
JP2015513291A (en) 2015-04-30
SG11201405867WA (en) 2014-11-27
DK2839645T3 (en) 2017-08-21
US9621921B2 (en) 2017-04-11
EP2839645A1 (en) 2015-02-25
ES2637490T3 (en) 2017-10-13
WO2013158642A1 (en) 2013-10-24
JP2015516767A (en) 2015-06-11
JP6525865B2 (en) 2019-06-05
ZA201407895B (en) 2016-05-25
WO2013158566A9 (en) 2014-11-27
JP2015516768A (en) 2015-06-11
TW201349867A (en) 2013-12-01
EP2839646A1 (en) 2015-02-25
PH12014502144A1 (en) 2014-12-01
RU2014145852A (en) 2016-06-10
AU2013249532A1 (en) 2014-10-23
WO2013158563A1 (en) 2013-10-24
HK1201103A1 (en) 2015-08-21
ZA201407860B (en) 2016-09-28
CN104221289A (en) 2014-12-17
KR20150003319A (en) 2015-01-08
JP6542400B2 (en) 2019-07-10
JP2018110405A (en) 2018-07-12
EP2839645B1 (en) 2017-05-17
CA2868533A1 (en) 2013-10-24
RU2014145851A (en) 2016-06-10
US20130272378A1 (en) 2013-10-17
HK1201661A1 (en) 2015-09-04
CN104247421A (en) 2014-12-24
US9124872B2 (en) 2015-09-01
SG11201405856XA (en) 2015-06-29
US20130272379A1 (en) 2013-10-17
EP2839584A1 (en) 2015-02-25
AU2013249427A1 (en) 2014-10-30
CN104247420A (en) 2014-12-24
TW201352004A (en) 2013-12-16
IL234705A0 (en) 2014-11-30
WO2013158566A1 (en) 2013-10-24
AR091338A1 (en) 2015-01-28
KR102115049B1 (en) 2020-05-25
KR20150003327A (en) 2015-01-08

Similar Documents

Publication Publication Date Title
US9832485B2 (en) Context adaptive entropy coding for non-square blocks in video coding
US20130272423A1 (en) Transform coefficient coding
US9462275B2 (en) Residual quad tree (RQT) coding for video coding
US9538175B2 (en) Context derivation for context-adaptive, multi-level significance coding
AU2012332242B2 (en) Intra-mode video coding
US9357185B2 (en) Context optimization for last significant coefficient position coding
US9338451B2 (en) Common spatial candidate blocks for parallel motion estimation
US9288508B2 (en) Context reduction for context adaptive binary arithmetic coding
CA2854509C (en) Progressive coding of position of last significant coefficient
US9654772B2 (en) Context adaptive entropy coding with a reduced initialization value set
US9826238B2 (en) Signaling syntax elements for transform coefficients for sub-sets of a leaf-level coding unit
US20130003859A1 (en) Transition between run and level coding modes
US20130182758A1 (en) Determining contexts for coding transform coefficient data in video coding
US20130114691A1 (en) Adaptive initialization for context adaptive entropy coding

Legal Events

Date Code Title Description
AS Assignment

Owner name: QUALCOMM INCORPORATED, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHIEN, WEI-JUNE;SOLE ROJALS, JOEL;CHEN, JIANLE;AND OTHERS;REEL/FRAME:030413/0600

Effective date: 20130415

AS Assignment

Owner name: QUALCOMM INCORPORATED, CALIFORNIA

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE SPELLING OF FIRST CONVEYING PARTY FROM WEI-JUNE CHIEN TO WEI-JUNG CHIEN. PREVIOUSLY RECORDED ON REEL 030413 FRAME 0600. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT.;ASSIGNORS:CHIEN, WEI-JUNG;SOLE ROJALS, JOEL;CHEN, JIANLE;AND OTHERS;REEL/FRAME:030441/0338

Effective date: 20130415

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION