WO2021077914A1 - 视频编码方法、装置、计算机设备和存储介质 - Google Patents
视频编码方法、装置、计算机设备和存储介质 Download PDFInfo
- Publication number
- WO2021077914A1 WO2021077914A1 PCT/CN2020/113153 CN2020113153W WO2021077914A1 WO 2021077914 A1 WO2021077914 A1 WO 2021077914A1 CN 2020113153 W CN2020113153 W CN 2020113153W WO 2021077914 A1 WO2021077914 A1 WO 2021077914A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- pixel
- coding unit
- gradient data
- division
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/119—Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/105—Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/14—Coding unit complexity, e.g. amount of activity or edge presence estimation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/182—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/577—Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/593—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/91—Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/96—Tree coding, e.g. quad-tree coding
Definitions
- This application relates to the field of video, in particular to video encoding methods, devices, computer equipment and storage media.
- each of the prediction modes performs rate-distortion cost calculation, and determines the division method of coding units according to the rate-distortion cost.
- this method is computationally expensive, resulting in low video coding efficiency.
- a video encoding method, device, computer equipment, and storage medium are provided.
- a video encoding method executed by a computer device comprising: obtaining a target encoding unit to be encoded; calculating the target pixel gradient data according to the pixel value of the pixel corresponding to the target encoding unit, the target pixel gradient data Obtained according to the difference between the pixel value of the pixel and the reference pixel value; Determine the target division decision result corresponding to the target coding unit according to the target pixel gradient data; Perform the target division decision result on the target coding unit according to the target division decision result Video encoding.
- a video encoding device comprising: a target encoding unit acquisition module for acquiring a target encoding unit to be encoded; a target pixel gradient data acquisition module for calculating the pixel value of a pixel corresponding to the target encoding unit Obtain target pixel gradient data, the target pixel gradient data is obtained according to the difference between the pixel value of the pixel point and the reference pixel value; the target division decision result determination module is configured to determine the target encoding unit according to the target pixel gradient data Corresponding target division decision result; a video encoding module, configured to perform video encoding on the target coding unit according to the target division decision result.
- a computer device includes a memory and a processor.
- the memory stores computer readable instructions.
- the processor executes the steps of the video encoding method.
- One or more non-volatile storage media storing computer-readable instructions.
- the processors execute the steps of the above-mentioned video encoding method.
- Figure 1 is a block diagram of a communication system provided in some embodiments.
- Figure 2 is a block diagram of a video encoder in some embodiments
- Figure 3 is a flowchart of a video encoding method in some embodiments.
- FIG. 4 is a schematic diagram of image block division in some embodiments.
- FIG. 5 is a flowchart of calculating gradient data of the target pixel according to the pixel value of the pixel corresponding to the target coding unit in some embodiments;
- FIG. 6 is a schematic diagram of obtaining target neighboring pixels corresponding to a current pixel in some embodiments
- Figure 7 is a schematic diagram of gradient calculation directions in some embodiments.
- Figure 8 is a schematic diagram of gradient calculation directions in some embodiments.
- FIG. 9 is a flowchart of dividing a target coding unit in some embodiments.
- FIG. 10 is a schematic diagram of sub-coding units obtained by dividing a target coding unit in some embodiments.
- Figure 11 is a structural block diagram of a video encoding device in some embodiments.
- Figure 12 is a block diagram of the internal structure of a computer device in some embodiments.
- first, second, etc. used in the present application can be used herein to describe various elements, but unless otherwise specified, these elements are not limited by these terms. These terms are only used to distinguish the first element from another element.
- first area may be referred to as the second area, and similarly, the second area may be referred to as the first area.
- Fig. 1 is a simplified block diagram of a communication system (100) according to an embodiment disclosed in the present application.
- the communication system (100) includes a plurality of terminal devices, which can communicate with each other through, for example, a network (150).
- the communication system (100) includes a first terminal device (110) and a second terminal device (120) interconnected through a network (150).
- the first terminal device (110) and the second terminal device (120) perform one-way data transmission.
- the first terminal device (110) may encode video data (for example, a video picture stream collected by the terminal device (110)) for transmission to the second terminal device (120) via the network (150).
- the encoded video data is transmitted in the form of one or more encoded video streams.
- the second terminal device (120) may receive the encoded video data from the network (150), decode the encoded video data to restore the video data, and display video pictures according to the restored video data.
- One-way data transmission is more common in applications such as media services.
- the communication system (100) includes a third terminal device (130) and a fourth terminal device (140) that perform two-way transmission of encoded video data, which can occur, for example, during a video conference.
- each of the third terminal device (130) and the fourth terminal device (140) can encode video data (for example, a video picture stream collected by the terminal device) to pass through the network (150) It is transmitted to the other terminal device among the third terminal device (130) and the fourth terminal device (140).
- Each of the third terminal device (130) and the fourth terminal device (140) can also receive the encoded data transmitted by the other terminal device of the third terminal device (130) and the fourth terminal device (140) Video data, and the encoded video data can be decoded to recover the video data, and the video picture can be displayed on an accessible display device according to the recovered video data.
- the first terminal device (110), the second terminal device (120), the third terminal device (130), and the fourth terminal device (140) may be servers, personal computers, and smart phones, but The principle disclosed in this application may not be limited to this. The embodiments disclosed in this application are applicable to laptop computers, tablet computers, media players and/or dedicated video conferencing devices.
- the network (150) means any number of networks that transmit encoded video data between the first terminal device (110), the second terminal device (120), the third terminal device (130), and the fourth terminal device (140), This includes, for example, wired (wired) and/or wireless communication networks.
- the communication network (150) can exchange data in circuit-switched and/or packet-switched channels.
- the network may include a telecommunications network, a local area network, a wide area network, and/or the Internet.
- a telecommunications network may include a local area network, a wide area network, and/or the Internet.
- the architecture and topology of the network (150) may be insignificant for the operations disclosed in this application.
- Fig. 2 is a block diagram of a video encoder (203) according to an embodiment disclosed in the present application.
- the video encoder (203) is arranged in the electronic device (220).
- the electronic device (220) includes a transmitter (240) (for example, a transmission circuit).
- the video encoder (203) may receive video samples from a video source (201) (not part of the electronic device (220) in the embodiment of FIG. 2), which may capture video images to be encoded by the video encoder (203) .
- the video source (201) is part of an electronic device (220).
- the video source (201) may provide a source video sequence in the form of a digital video sample stream to be encoded by the video encoder (203).
- the digital video sample stream may have any suitable bit depth (e.g., 8-bit, 10-bit, 12-bit). 7), any color space (e.g. BT.601 Y CrCB, RGB%) and any suitable sampling structure (e.g. Y CrCb 4:2:0, Y CrCb 4:4:4).
- the video source (201) may be a storage device that stores previously prepared videos.
- the video source (201) may be a camera that collects local image information as a video sequence.
- the video data can be provided as multiple separate pictures, which are given motion when viewed in sequence.
- the picture itself can be constructed as a spatial pixel array, where each pixel can include one or more samples depending on the sampling structure, color space, etc. used. Those skilled in the art can easily understand the relationship between pixels and samples. The following section focuses on describing the sample.
- the video encoder (203) may encode and compress the pictures of the source video sequence into an encoded video sequence (243) in real time or under any other time constraints required by the application. Enforcing the appropriate encoding speed is a function of the controller (250).
- the controller (250) controls other functional units as described below and is functionally coupled to these units. For the sake of brevity, the coupling is not indicated in the figure.
- the parameters set by the controller (250) may include rate control related parameters (picture skipping, quantizer, ⁇ value of rate-distortion optimization technology, etc.), picture size, group of pictures (GOP) layout, and maximum motion Vector search range, etc.
- the controller (250) can be used to have other suitable functions related to a video encoder (203) optimized for a certain system design.
- the video encoder (203) operates in an encoding loop.
- the encoding loop may include a source encoder (230) (for example, responsible for creating symbols based on the input picture to be encoded and reference pictures, such as a symbol stream) and embedded in the video encoder (203) (Local) decoder in the (233).
- the decoder (233) reconstructs the symbols to create sample data in a manner similar to the way the (remote) decoder creates sample data (because in the video compression technology considered in this application, any compression between the symbol and the encoded video stream is Lossless).
- the reconstructed sample stream (sample data) is input to the reference picture memory (234).
- the content in the reference picture memory (234) is also bit-accurately corresponding between the local encoder and the remote encoder.
- the reference picture samples that the encoder's prediction part "sees" are exactly the same as the sample values that the decoder will "see” when using prediction during decoding.
- This basic principle of reference picture synchronization (and drift that occurs when synchronization cannot be maintained, for example, due to channel errors) is also used in some related technologies.
- the operation of the "local” decoder (233) can be the same as that of the "remote” decoder, that is, the decoder on the decoding side.
- the source encoder (230) may perform motion compensated predictive encoding.
- the reference comes from one or more previously coded pictures designated as "reference pictures" in the video sequence, and the motion-compensated predictive coding predictively codes the input pictures.
- the encoding engine (232) encodes the difference between the pixel block of the input picture and the pixel block of the reference picture, which can be selected as a prediction reference for the input picture.
- the local video decoder (233) can decode the encoded video data that can be designated as a reference picture based on the symbols created by the source encoder (230).
- the operation of the encoding engine (232) may be a lossy process.
- the reconstructed video sequence can usually be a copy of the source video sequence with some errors.
- the local video decoder (233) replicates the decoding process, which can be performed by the video decoder on the reference pictures, and the reconstructed reference pictures can be stored in the reference picture cache (234). In this way, the video encoder (203) can locally store a copy of the reconstructed reference picture, which has common content with the reconstructed reference picture to be obtained by the remote video decoder (there is no transmission error).
- the predictor (235) may perform a predictive search for the encoding engine (232). That is, for a new picture to be encoded, the predictor (235) can search the reference picture memory (234) for sample data (as candidate reference pixel blocks) or some metadata that can be used as an appropriate prediction reference for the new picture, For example, reference picture motion vector, block shape, etc. The predictor (235) may operate pixel by pixel based on the sample block to find a suitable prediction reference. In some cases, according to the search result obtained by the predictor (235), it can be determined that the input picture may have a prediction reference obtained from multiple reference pictures stored in the reference picture memory (234).
- the controller (250) can manage the encoding operation of the source encoder (230), including, for example, setting parameters and subgroup parameters for encoding video data.
- the output of all the above functional units can be entropy-encoded in the entropy encoder (245).
- the entropy encoder (245) performs lossless compression on symbols generated by various functional units according to techniques such as Huffman coding, variable length coding, and arithmetic coding, thereby converting the symbols into an encoded video sequence.
- the transmitter (240) may buffer the encoded video sequence created by the entropy encoder (245) to prepare for transmission through the communication channel (260), which may lead to the storage of encoded video data The hardware/software link of the storage device.
- the transmitter (240) may combine the encoded video data from the video encoder (203) with other data to be transmitted, such as encoded audio data and/or auxiliary data streams (source not shown).
- the controller (250) can manage the operation of the video encoder (203). During encoding, the controller (250) may assign a certain encoded picture type to each encoded picture, but this may affect the encoding technique that can be applied to the corresponding picture. For example, pictures can usually be assigned to any of the following picture types:
- I picture An intra picture (I picture), which can be a picture that can be coded and decoded without using any other picture in the sequence as a prediction source.
- Some video codecs allow different types of intra pictures, including, for example, Independent Decoder Refresh ("IDR") pictures.
- IDR Independent Decoder Refresh
- a predictive picture which can be a picture that can be encoded and decoded using intra-frame prediction or inter-frame prediction.
- the intra-frame prediction or inter-frame prediction uses at most one motion vector and reference index to predict the value of each block. Sample value.
- a bi-predictive picture which can be a picture that can be encoded and decoded using intra prediction or inter prediction.
- the intra prediction or inter prediction uses at most two motion vectors and a reference index to predict each The sample value of the block.
- multiple predictive pictures can use more than two reference pictures and associated metadata for reconstructing a single block.
- the source picture can usually be spatially subdivided into multiple sample blocks (for example, 4 ⁇ 4, 8 ⁇ 8, 4 ⁇ 8, or 16 ⁇ 16 sample blocks), and coded block by block. These blocks can be predictively coded with reference to other (encoded) blocks, and the other blocks are determined according to the coding allocation of the corresponding picture applied to the block.
- a block of an I picture may be subjected to non-predictive coding, or the block may be subjected to predictive coding (spatial prediction or intra prediction) with reference to an already coded block of the same picture.
- the pixel block of the P picture can be predicted and coded by spatial prediction or temporal prediction with reference to a previously coded reference picture.
- the block of the B picture may be predicted and coded through spatial prediction or through temporal prediction with reference to one or two previously coded reference pictures.
- the video encoder (203) can perform the encoding operation according to a predetermined video encoding technology or standard such as ITU-T H.265 Recommendation. In operation, the video encoder (203) can perform various compression operations, including predictive encoding operations that utilize temporal and spatial redundancy in the input video sequence. Therefore, the encoded video data can conform to the syntax specified by the video encoding technology used or the standard.
- the transmitter (240) may transmit additional data when transmitting the encoded video.
- the source encoder (530) can treat such data as part of an encoded video sequence.
- the additional data may include other forms of redundant data such as time/space/SNR enhancement layers, redundant pictures and slices, SEI messages, VUI parameter set fragments, and so on.
- the captured video can be used as multiple source pictures (video pictures) in time series.
- Intra picture prediction uses the spatial correlation in a given picture
- inter picture prediction uses the (temporal or other) correlation between pictures.
- the specific picture being coded/decoded is divided into blocks, and the specific picture being coded/decoded is called the current picture.
- the block in the current picture can be coded by a vector called a motion vector.
- the motion vector points to a reference block in a reference picture, and in the case of using multiple reference pictures, the motion vector may have a third dimension for identifying the reference picture.
- bi-directional prediction technology can be used in inter picture prediction.
- two reference pictures are used, for example, the first reference picture and the second reference picture are both before the current picture in the video in the decoding order (but may be past and future respectively in the display order).
- the block in the current picture may be encoded by the first motion vector pointing to the first reference block in the first reference picture and the second motion vector pointing to the second reference block in the second reference picture.
- the block may be predicted through a combination of the first reference block and the second reference block.
- merge mode technology can be used in inter picture prediction to improve coding efficiency.
- predictions such as inter-picture prediction and intra-picture prediction are performed in units of blocks.
- the pictures in a video picture sequence are divided into coding tree units (CTU) for compression, and the CTUs in the pictures have the same size, such as 64 ⁇ 64 pixels, 32 ⁇ 32 pixels Or 16 ⁇ 16 pixels.
- a CTU includes three coding tree blocks (CTB), and the three coding tree blocks are a luma CTB and two chroma CTBs.
- each CTU can be split into one or more coding units (CU) by a quadtree.
- a 64 ⁇ 64 pixel CTU can be split into one 64 ⁇ 64 pixel CU, or 4 32 ⁇ 32 pixel CU, or 16 16 ⁇ 16 pixel CU.
- each CU is analyzed to determine the prediction type used for the CU, such as an inter prediction type or an intra prediction type.
- the CU is split into one or more prediction units (PU).
- each PU includes a luma prediction block (PB) and two chroma PB.
- the prediction operation in encoding is performed in units of prediction blocks.
- the prediction block includes a matrix of pixel values (for example, brightness values), such as 8 ⁇ 8 pixels, 16 ⁇ 16 pixels, 8 ⁇ 16 pixels, 16 ⁇ 8 pixels, and so on.
- a video encoding method is proposed, and this embodiment mainly uses the method applied to the terminal device in FIG. 1 as an example. Specifically, it can include the following steps:
- Step S302 Obtain a target coding unit to be coded.
- the target coding unit is a coding unit that needs to be divided to be determined.
- video frames can be obtained, and the video frame images can be divided in units of LCU ((Largest Coding Unit), maximum coding unit), that is, the video frame images can be divided into multiple maximum coding units with preset sizes (LCU), and then recursively judge whether the coding unit can be divided into small coding units (Coding Unit, CU) until the coding unit is divided into the smallest coding unit, thereby forming a coding unit tree (Coding Unit Tree, CTU) structure.
- the target coding unit may be an LCU or a coding unit obtained by dividing the LCU.
- the terminal device may divide the video image frame into multiple 64*64 pixel LCUs, and may use the LCU as the target coding unit.
- the LCU is divided to obtain 4 coding units (32*32 pixels)
- the coding units obtained by these 4 divisions can be used as the target coding unit.
- step S304 the target pixel gradient data is calculated according to the pixel value of the pixel corresponding to the target encoding unit, and the target pixel gradient data is obtained according to the difference between the pixel value of the pixel and the reference pixel value.
- the pixel gradient reflects the change of the pixel value in the image.
- the pixel gradient can be obtained by using the difference between the pixel value of the pixel and the reference pixel value. The greater the difference, the greater the gradient.
- the difference can be represented by a difference or a ratio. For example, the pixel value of a pixel can be subtracted from the reference pixel value, and the absolute value of the difference can be regarded as the pixel difference. It is also possible to divide the pixel value of the pixel by the reference pixel value, and use the obtained ratio as the pixel difference.
- the pixel gradient data is data related to pixel gradients, and may include, for example, at least one of the pixel gradient itself or the difference between the pixel gradients.
- the reference pixel value is the pixel value referred to in determining the pixel gradient.
- the reference pixel value can be obtained according to the pixel value of the target coding unit. For example, the pixel value of any pixel in the target coding unit can be selected as the reference pixel value, or the pixel value of the target coding unit can be counted to obtain the pixel by calculation. Average, the calculated pixel average is used as the reference pixel value.
- the difference between the pixel value of each pixel and the reference pixel value can be calculated, that is, the pixel gradient corresponding to each pixel, and the pixel gradient of each pixel is calculated to obtain the target gradient data.
- the pixel value of each pixel The gradient is accumulated to obtain the target pixel gradient data.
- the reference pixel points corresponding to different pixels may be different, and the pixel value of the reference pixel point corresponding to the pixel point may be used as the reference pixel value.
- the pixel value of the adjacent pixel of the pixel can be used as the reference pixel value.
- the terminal device can use the pixel value of one of the adjacent pixels as the reference pixel value, or it can integrate the pixel values of multiple adjacent pixels as the reference pixel value, for example, the pixel average value of the adjacent pixels is used as the reference pixel value.
- the target pixel gradient data may be one or more.
- the target pixel gradient data may include at least one of global gradient data or local gradient data.
- the global gradient data is obtained based on the difference between the pixel value of the pixel and the pixel average value, that is, the reference pixel value may be the pixel average value to reflect the overall pixel change of the target coding unit.
- the local gradient data is obtained based on the difference between the pixel value of a pixel and the pixel value of other pixels in the target coding unit. That is, when calculating the pixel gradient corresponding to a pixel, the pixel value of the pixel can be calculated with that in the target coding unit.
- the difference in the pixel values of other pixels is used to obtain the pixel gradient corresponding to the pixel, where the other pixel as the reference pixel may be the pixel adjacent to the pixel.
- Step S306 Determine the target division decision result corresponding to the target coding unit according to the target pixel gradient data.
- the target division decision result may be the termination of division or division. That is, the result of the candidate division decision may include termination of division and division. Terminating the division refers to not dividing the target coding unit. Division refers to the need to divide the target coding unit into multiple sub coding units.
- the result of the target division decision may also be uncertain. If it cannot be determined, it may be determined whether the target coding unit needs to be divided according to the traditional method of determining the division of the coding unit. For example, the rate-distortion cost corresponding to the sub-coding unit obtained after the target coding is divided can be calculated, the sum of the rate-distortion cost corresponding to the sub-coding unit can be calculated, and the rate-distortion cost corresponding to the target coding unit without division can be compared, if it is If the sum of the rate-distortion costs corresponding to the sub coding units is small, the target coding unit needs to be divided. If the target coding unit is not divided and the corresponding rate distortion cost is small, there is no need to divide the target coding unit.
- the target pixel gradient data can be compared with the preset threshold. If it is less than the preset first gradient threshold, it means that the gradient difference or the gradient is relatively small. If the content of the target coding unit changes little, the division decision result can be to terminate division. If it is greater than the preset second gradient threshold, it indicates that the gradient difference or the gradient is relatively large, and the division decision result may be division.
- the second gradient threshold may be greater than or equal to the first gradient threshold.
- all of the target gradient data or the target gradient data exceeding a preset number may be less than the preset first gradient threshold, and the division decision result is to terminate the division. It may also be that all target gradient data or target gradient data exceeding a preset number are greater than the preset second gradient threshold, then the division decision result is division.
- Step S308 Perform video encoding on the target coding unit according to the result of the target division decision.
- the target coding unit may be video-encoded according to the target division decision result.
- the target coding unit when the target partitioning decision result is to terminate the partitioning, the target coding unit is used as the coding unit for video coding.
- the division result corresponding to the target coding unit is the termination division, the subsequent division process of the target coding unit will be directly terminated, and the method of determining whether the target coding unit needs to be divided according to the rate-distortion cost will no longer be used, so coding time can be saved.
- the terminal device may use the target coding unit as the unit for obtaining the reference block (reference unit), obtain the reference block corresponding to the target coding unit, obtain the predicted pixel value of the target coding unit according to the reference block, and encode the target coding unit.
- the real pixel value of the unit subtracts the predicted pixel value to obtain the prediction residual.
- the prediction residual is processed by quantization and entropy coding to obtain the encoded data, which is stored locally or transmitted to another terminal device.
- the terminal device divides the target coding unit to obtain multiple sub coding units, and performs video encoding according to the sub coding units.
- the process of performing intra-frame prediction on the target coding unit to determine the rate distortion cost can be omitted, and the target coding unit can be divided directly, so the rate distortion of the target coding unit can be saved.
- the division method can be set according to needs, for example, quadtree division, trinomial tree differentiation, or binary tree distribution.
- M-ary tree division refers to dividing the target coding unit into M parts. M can be an integer greater than or equal to 2.
- the sub-coding unit obtained by division if the sub-coding unit is the smallest coding unit specified in the video coding standard, the division may not be continued. If the sub coding unit is not the smallest coding unit specified in the video coding standard, the sub coding unit can be used as the new target coding unit, and the division decision can be continued.
- the target pixel gradient data can be obtained by calculating the pixel value of the encoding unit.
- the target pixel gradient data is obtained according to the difference between the pixel value of the pixel and the reference pixel value. Therefore, it can reflect the image change of the target encoding unit.
- the target pixel gradient data has high accuracy for determining the target division decision result corresponding to the target coding unit, and can reduce the time for obtaining the division decision result, thus shortening the coding time of video coding and improving the video coding efficiency.
- Figure 4 shows a schematic diagram of the division of a 64*64 pixel image block.
- a square represents a coding block.
- a 64*64 pixel image block can be used as the target coding unit, and the method provided in the embodiment of the present application can be executed to obtain the target coding unit division decision result as division. Therefore, the 64*64 pixel image block can be divided into 4 sub-coding units. Take these 4 sub-coding units as the new target coding unit, and continue to execute the method provided in the embodiment of this application until the 4*4 pixel coding unit is obtained, because the 4*4 pixel coding unit is the smallest of the video coding standard. Coding unit, so you can stop making division decisions.
- step S302 that is, calculating the target pixel gradient data according to the pixel value of the pixel corresponding to the target coding unit includes the following steps:
- Step S502 Determine the current pixel in the target coding unit, and obtain the pixel value of the target neighboring pixel corresponding to the current pixel as a reference pixel value.
- the current pixel refers to a pixel in the target coding unit that currently needs to calculate a pixel gradient.
- Each pixel can be regarded as the current pixel at the same time, or one by one or multiple pixels can be regarded as the current pixel.
- the target adjacent pixel can be one or more of the pixels adjacent to the current pixel. For example, as shown in Figure 6, a small square represents a pixel.
- the target neighboring pixels can be all pixels from A to H, or one or more of the pixels from A to H can be selected as the target neighboring pixels.
- the target gradient calculation direction may be obtained, and the adjacent pixel point corresponding to the current pixel in the target gradient calculation direction may be obtained as the target adjacent pixel point.
- the target gradient calculation direction refers to the direction of the gradient calculation, and the gradient calculation direction may be determined as required, for example, may include at least one of a horizontal direction, a vertical direction, a 45-degree direction, or a 135-degree direction.
- the horizontal direction can be from left to right or right to left, and the vertical direction is from top to small or bottom to top.
- the direction of 45 degrees refers to the direction from the lower left corner to the upper right corner of the target coding unit, and the direction of 135 degrees refers to the direction from the lower right corner to the upper left corner of the target coding unit.
- FIG. 7 it is the gradient calculation in some embodiments. Schematic diagram of directions.
- the neighboring pixel corresponding to the current pixel is the forward neighboring pixel of the current pixel in the target gradient calculation direction.
- the forward neighboring pixel refers to the target gradient calculation direction.
- the adjacent pixel before the current pixel For example, assuming that the target gradient calculation direction is the horizontal direction in FIG. 7, for the current pixel X1, the target neighboring pixel is H. Assuming that the target gradient calculation direction is the vertical direction in FIG. 7, for the current pixel X1, the target neighboring pixel is B. Assuming that the target gradient calculation direction is the 45-degree direction in FIG. 7, for the current pixel X1, the target neighboring pixel is G. Assuming that the target gradient calculation direction is the 135 degree direction in FIG. 7, then for the current pixel X1, the target neighboring pixel is E.
- Step S504 Calculate the difference between the pixel value of the current pixel and the reference pixel value, and obtain the difference of the pixel value corresponding to the current pixel.
- the difference between the pixel value of the current pixel and the reference pixel value can be obtained according to the difference.
- it can be the absolute value of the difference or the square of the difference.
- Step S506 Perform statistics on the difference in pixel values corresponding to each pixel in the target coding unit to obtain target pixel gradient data.
- statistics can be average or median. For example, a summation calculation is performed on the pixel value difference corresponding to each pixel in the target coding unit, and the obtained sum is divided by the number of pixels to obtain the target pixel gradient data.
- the target pixel gradient data may include one or more local pixel gradients corresponding to the target gradient calculation direction.
- it includes the local pixel gradients corresponding to the horizontal direction, the vertical direction, the 45-degree direction, and the 135-degree direction respectively.
- the local pixel gradients corresponding to the horizontal direction, the vertical direction, the 45 degree direction, or the 135 degree direction can be expressed by formulas (1) to (4) respectively.
- LocalGradient represents the local pixel gradient
- LocalGradient_HOR represents the local pixel gradient in the horizontal direction.
- LocalGradient_VER represents the local pixel gradient in the vertical direction.
- LocalGradient_45 represents the local pixel gradient in the direction of 45 degrees.
- LocalGradient_135 represents the local pixel gradient in the direction of 135 degrees. Width represents the number of pixels in the width direction, that is, in the horizontal direction, and Height represents the number of pixels in the height direction, that is, in the vertical direction.
- P represents the pixel value, for example, P i,j represents the pixel value of the i-th row and j-th column in the target coding unit, abs represents the absolute value, and ⁇ represents the summation.
- step S302 that is, calculating the target pixel gradient data according to the pixel value of the pixel corresponding to the target coding unit, includes the following steps: dividing the target coding unit into multiple regions, calculating the pixel gradient corresponding to each region, The pixel gradient is obtained according to the difference between the pixel value of the pixel point corresponding to the area and the reference pixel value; the pixel gradient difference between the areas is calculated to obtain the target pixel gradient data.
- multiple refers to two or more, including two.
- the method of dividing the target code into multiple regions can be set as required, for example, the target coding region can be divided into two regions.
- the method of dividing into two regions may be at least one of horizontal division, vertical division, or diagonal division.
- the target coding unit is a square block, that is, when the number of pixels in the length direction and the width direction are the same, horizontal division, vertical division, and diagonal division can be performed.
- the target coding unit is a non-square block, if the width is greater than the height, vertical division can be performed, and the target coding unit is divided into the left half and the right half. If the height is greater than the width, horizontal division may be performed, and the target coding unit is divided into the upper half and the lower half.
- the reference pixel value may be a pixel average value
- the pixel average value may be the pixel average value of the pixel point corresponding to the target coding unit, or the pixel average value of the pixel point corresponding to the region.
- it can also be the pixel value of the adjacent pixel adjacent to the pixel.
- the pixel gradient difference between the regions can be represented by the absolute value, the square of the difference or the ratio value of the pixel gradient difference between the two regions.
- the pixel gradient corresponding to the area may be a statistical value of the pixel difference corresponding to each pixel point in the area, for example, the sum or average value of the pixel difference corresponding to each pixel point in the area.
- calculating the pixel gradient difference between the regions includes: subtracting the pixel gradient corresponding to the second region from the pixel gradient corresponding to the first region to obtain the pixel gradient difference between the first region and the second region.
- the first area and the second area may be any area corresponding to the target coding unit.
- the first area may be the upper half of the target coding unit, and the second area may be the lower half of the target coding unit.
- the gradient is directional, for example, it may include a gradient in a horizontal direction and a gradient in a vertical direction.
- the gradients corresponding to different directions have different meanings. Therefore, when performing gradient comparison, such as gradient subtraction, the pixel gradients in the same direction are subtracted. For example, the pixel gradient in the vertical direction in the first area is subtracted from the pixel gradient in the vertical direction in the second area.
- the target pixel gradient data may include gradient differences of global pixel gradients corresponding to one or more gradient calculation directions.
- it includes the gradient difference of the global pixel gradient corresponding to the horizontal direction, the vertical direction, the 45-degree direction, and the 135-degree direction respectively.
- the global pixel gradient difference in the horizontal direction may be the difference in global pixel gradient between the left half and the right half.
- the global pixel gradient in the vertical direction may be the global pixel gradient difference between the upper half and the lower half.
- the global pixel gradient in the 45-degree direction may be the global pixel gradient difference between the lower left part and the upper right part.
- the global pixel gradient in the 135 degree direction may be the global pixel gradient difference between the upper left part and the lower right part.
- the global pixel gradient differences corresponding to the horizontal direction, the vertical direction, the 45-degree direction, and the 135-degree direction can be expressed by formulas (5) to (8), respectively.
- GlobalGradient represents the global pixel gradient
- GlobalGradient_HOR represents the global pixel gradient difference in the horizontal direction
- GlobalGradient_VER represents the global pixel gradient difference in the vertical direction
- GlobalGradient_45 represents the global pixel gradient difference in the 45-degree direction
- GlobalGradient_135 represents the gradient difference of all pixels in the 135 degree direction.
- N represents the number of pixels in the length direction and the width direction.
- the number of pixels in the length direction and the width direction of the target coding unit is the same as an example.
- the pixels in the length direction and the width direction of the target coding unit The number of points can also be different.
- abs means absolute value
- ⁇ means sum.
- P represents the pixel value, for example, P i,j represents the pixel value of the i-th row and j-th column in the target coding unit.
- P avg represents the pixel average value.
- the target pixel gradient data meets the first threshold condition, and when the target pixel gradient data meets the first threshold condition, it is determined that the target division decision result corresponding to the target coding unit is the termination division.
- the first threshold condition includes that target pixel gradient data smaller than the corresponding first threshold satisfy the first quantity.
- satisfying the first number means that it is greater than or equal to the first number.
- the first number can be set as needed. For example, it can be the number corresponding to all the target pixel gradient data, or it can be part of it, and the first number can also be a preset The value of, for example 5. If the target pixel gradient data that is less than the corresponding first threshold is greater than or equal to the first number, it means that the image content of the target coding unit has a small change and satisfies the condition of no division, so the division can be terminated.
- the first threshold can be obtained based on experience or experiment. For example, it is possible to count the corresponding target gradient data when dividing and terminating the division in video frames of different content, and then select a value such that the correctness rate of the decision to terminate the division meets the preset correctness rate, such as 90%, as the first threshold.
- different quantization parameters may correspond to different first thresholds, and target coding units of different sizes may also correspond to different first thresholds.
- target coding units of different sizes may also correspond to different first thresholds.
- the gradient value of the decision to terminate the division that meets the preset accuracy rate is used as the first threshold.
- different target pixel gradient data can correspond to different first thresholds.
- the target pixel gradient data in the 45-degree direction and the 135-degree direction are larger than the target pixel gradient data in the horizontal and vertical directions. Therefore, a larger first threshold can be set for the target pixel gradient data in the 45-degree direction and the 135-degree direction, that is, the first threshold corresponding to the target pixel gradient data in the 45-degree direction and the 135-degree direction, which is greater than the horizontal direction and the vertical direction.
- the first threshold corresponding to the target pixel gradient data in the target pixel gradient data for example, the first threshold corresponding to the target pixel gradient data in the 45 degree direction and the 135 degree direction may be a times the first threshold value corresponding to the target pixel gradient data in the horizontal direction, and a is greater than 1, for example, it can be 1.414.
- the first threshold corresponding to the global gradient and the first threshold corresponding to the local gradient may be different.
- the first threshold may be variable.
- the corresponding first threshold may be preset.
- the target gradient data corresponding to the adjacent CTUs can be obtained, and the first threshold value can be obtained according to the target gradient data corresponding to the adjacent CTUs.
- the largest gradient data among the upper and left CTUs of the CTU may be used as the first threshold.
- the average gradient data of the upper and left CTUs can also be used as the first threshold.
- updating the first threshold with the target gradient data corresponding to the adjacent CTUs can make the judgment conditions of the division decision result dynamically change according to the change of the image content, which improves the accuracy of the division decision result degree.
- determining the target division decision result corresponding to the target coding unit according to the target pixel gradient data includes: when the target pixel gradient data meets the second threshold condition, determining that the target division decision result corresponding to the target coding unit is division, and the second The threshold condition includes that target pixel gradient data greater than the corresponding second threshold meets the second quantity.
- satisfying the second number refers to greater than or equal to the second number
- the first threshold and the second threshold may be the same or different.
- the second number can be set according to needs, for example, it can be the number corresponding to all the target pixel gradient data, it can also be a part, or it can be a preset value. If the target pixel gradient data that is greater than the corresponding second threshold meets the second quantity, it means that the image content of the target coding unit has a large change and satisfies the dividing condition, so the target coding unit can be divided.
- the terminal device may determine whether to use the first threshold condition or the second threshold condition to determine the target division decision result for different situations.
- the threshold condition and the corresponding division decision result can be determined according to the number of subunits obtained when the target coding unit is divided. For example, if it is a quad tree (quad tree, QT) division, you can determine whether the target pixel gradient data meets the first threshold condition, for example, determine whether all target pixel gradient data are less than the first threshold, and if so, determine The target division decision result corresponding to the target coding unit is the termination division.
- QT quad tree
- the division mode of the coding unit can be set as required.
- the coding unit can be set to perform quad tree division, or multiple division modes can be set.
- VVC Very Video Coding, multi-function video coding
- more complex and diverse binary tree, triple tree, and quad tree division methods can be used for the division of coding units, so that the amount of encoded data is smaller.
- the coding block ie, coding unit
- the coding block can have a maximum size of 128x128 pixels and a minimum size of 4x4 pixels.
- the coding block can be divided into a quad tree first, and then divided into a binary tree and a triple tree on the leaf nodes of the quad tree division.
- the minimum value of the leaf node of the quad tree is 16x16 pixels.
- the process shown in the figure can be judged.
- the recursive process of quadtree division is performed first, and then the leaf node blocks of the quadtree division are sequentially traversed vertically and horizontally. Divide the binary tree and the ternary tree until the optimal division result of all CU blocks in the current CTU is found. Therefore, in the process of QT (quaternary tree) division, TT (three-ary tree) division, and BT (two-ary tree) division, the method provided in the embodiment of this application can be used to quickly determine the result of the division decision.
- MT Multiple-type Tree, multi-tree division
- MTdepth represents the depth of MT division
- QTdepth represents the depth of QT division. For each additional division, the depth is increased by 1.
- step S302 that is, calculating the target pixel gradient data according to the pixel value of the pixel corresponding to the target coding unit includes: dividing the target coding unit to obtain multiple sub-coding units; calculating the pixel gradient corresponding to each sub-coding unit , The target pixel gradient data is obtained, and the pixel gradient corresponding to the sub-coding unit is obtained according to the difference between the pixel value of the pixel corresponding to the sub-coding unit and the reference pixel value.
- the division method can be set according to needs, for example, it can be divided into four or three.
- the division method can be determined according to the current division process. If it is currently judged whether the target coding unit is divided into a three-ary tree, the target coding unit may be divided into 3 sub-coding units.
- the divided sub coding unit meets the size requirement of the coding unit in the video coding standard or is smaller than the minimum coding unit.
- the target coding unit can be divided into multiple sub-blocks according to the size of the smallest coding unit .
- the coding unit can be divided into 8 sub-coding units according to the size of 4*4 pixels.
- the pixel gradient of the sub-coding unit may include at least one of a global gradient or a local gradient.
- the pixel gradient data of the sub-encoding unit can be used as the target pixel gradient data.
- determining the target division decision result corresponding to the target coding unit according to the target pixel gradient data includes: calculating the pixel gradient data difference between the sub-coding units, and determining the target coding unit according to the pixel gradient data difference between the sub-coding units The corresponding goal division decision result.
- the gradient data difference can be obtained according to the ratio or the difference, for example, it can be the absolute value of the difference or the square of the difference.
- the terminal device may calculate the difference of pixel gradient data between every two sub-encoding units. It may also be calculating at least one of the pixel gradient data difference in the horizontal direction or the vertical direction.
- the pixel gradient data difference in the horizontal direction may be the difference between the pixel gradient data of the left sub-block (sub-encoding unit) and the adjacent right sub-block.
- the difference of pixel gradient data in the horizontal direction in the vertical direction may be the difference in pixel gradient data of the upper sub-block and the adjacent lower sub-block. For example, as shown in FIG.
- the target coding unit is divided into 4 sub coding units, which are 0, 1, 2, and 3, respectively.
- the pixel gradient difference in the horizontal direction may include at least one of the pixel gradient data difference between the sub-encoding unit 0 and the sub-encoding unit 1, or the pixel gradient data difference between the sub-encoding unit 2 and the sub-encoding unit 3.
- the pixel gradient data difference in the vertical direction may include at least one of the pixel gradient data difference between the sub-encoding unit 0 and the sub-encoding unit 2, or the pixel gradient data difference between the sub-encoding unit 1 and the sub-encoding unit 3.
- the difference in pixel gradient data between the sub-coding units can be represented by a difference or a ratio.
- the pixel gradient difference may be x divided by y, or y divided by x.
- the third threshold judgment condition includes greater than the corresponding first threshold.
- the pixel gradient data difference of the three thresholds satisfies the third quantity.
- satisfying the third number refers to greater than or equal to the third number
- the third threshold can be set as required, for example, can be obtained based on experience or experiment. For example, it is possible to count the difference of the corresponding target gradient data in the video frames of different content during the division and termination of the division, and then select the difference value of the gradient data such that the division decision result is that the accuracy of the division meets the preset accuracy, for example, 98%.
- the third threshold can also be adjusted adaptively according to the length and width of the coding block and the ratio of the length to the width. For example, in the statistics of video frames of different content, when dividing and terminating the division, when the corresponding target gradient data is different, different coding block sizes can be distinguished for statistics, so as to obtain the thresholds corresponding to various coding units of different sizes.
- the third number can be set according to needs, for example, it can be the number corresponding to all the pixel gradient data differences, or it can be part of it. For example, as long as one of the pixel gradient data differences is greater than the third threshold, the target division decision corresponding to the target coding unit The result is division.
- the pixel gradient data of the sub-coding unit differs greatly, it means that the content of each sub-coding unit is relatively different. Therefore, it is necessary to obtain the corresponding reference block for prediction, and obtain the predicted value of the sub-coding unit to make the prediction The residual error is smaller, therefore, the result of the division decision can be divided to reduce the amount of coded data obtained by encoding.
- determining the target division decision result corresponding to the target coding unit according to the target pixel gradient data includes: obtaining the target number of sub-coding units whose pixel gradient data is greater than a fourth threshold, when the target number exceeds the fourth number or when the target number When the proportion of the number of sub coding units exceeds the first proportion, it is determined that the target division decision result corresponding to the target coding unit is division.
- the first ratio can be set as required, for example, it can be one-quarter.
- the fourth number can also be set as required, for example, it can be 5.
- the target number exceeds the fourth number or when the ratio of the target number to the number of sub-coding units exceeds the first ratio, it means that the image content in the sub-coding unit changes relatively quickly. Therefore, the target coding unit needs to be divided to make it possible to The coding is performed in units of sub-coding units, or the sub-coding units are further divided to reduce prediction residuals.
- the step of how to determine the target division decision result provided in the embodiment of the present application may be applied before the step of determining the division depth of the coding unit by using the rate-distortion cost.
- the rate-distortion cost For example, using a variety of hybrid tree (such as 4-ary tree, 3-ary tree, and 2-ary tree) structure to determine the coding unit, although it can improve the coding efficiency, but also brings a very large increase in coding time, for the encoder
- the acceleration is very unfavorable.
- each mode must perform the intra-frame prediction process in that mode.
- Intra-frame prediction includes 67 intra-frame direction predictions and a lot of intra-frame prediction techniques, and a lot of rate-distortion optimization related The decision will greatly increase the computational complexity of the encoder.
- the structural characteristics of the binary tree, the triple tree, and the quad tree, as well as the content information of the coding block it is possible to decide in advance whether the current CU block can be divided or not divided, which can effectively reduce the structure of various divisions in the CU block.
- the tentative traversal greatly saves coding time and improves the compression efficiency of video standards.
- different decision-making methods may be provided for square blocks (that is, the target coding unit is a square) or non-square blocks.
- it is the first decision-making method for square blocks.
- Thr_G represents the threshold corresponding to the global gradient
- Thr_L represents the threshold corresponding to the local gradient.
- &&" means "and” symbol
- " means "or” symbol.
- a and b can be set based on experience or experiment, for example. It can also be obtained by a statistical method.
- the a value and the b value corresponding to the termination division are obtained to obtain a value that makes the termination division decision accuracy rate higher than the preset accuracy rate.
- the values of a and b can also be adjusted adaptively according to the size of the target coding unit. For example, a and b can be 1.414 and 1.2, respectively.
- getRatioFlag represents the ratio.
- GlobalGradientSubBlock_Ver[H] represents the global gradient difference of the sub-coding unit H, for example, GlobalGradientSubBlock_Ver[1] represents the global gradient difference of the sub-coding unit 1.
- NotSplit 1; //It must not be divided, that is, the result of the division decision is to terminate the division.
- Int HorFalg getRatioFlag(GlobalGradientSubBlock_Hor[0],GlobalGradientSubBlock_Hor[1])
- Int VerFlag getRatioFlag(GlobalGradientSubBlock_Ver[0],GlobalGradientSubBlock_Ver[2])
- Thr_G_Limit refers to the gradient threshold corresponding to the global gradient, which can be the same as or different from Thr_G.
- Thr_L_Limit refers to the gradient threshold corresponding to the local gradient, which can be the same as or different from Thr_L.
- DoSplit 1;//If the minimum value of the global pixel gradient difference corresponding to the horizontal, vertical, 45-degree, and 135-degree directions is greater than the gradient threshold corresponding to the global pixel gradient, and the horizontal, vertical, and 45-degree directions And the minimum value of the local pixel gradient corresponding to the 135 degree direction is greater than the gradient threshold corresponding to the local pixel gradient, then the division decision result is to divide
- GlobalGradient_Hor_Left, GlobalGradient_Hor_Right, GlobalGradient_Ver_Above, GlobalGradient_Ver_Below respectively refer to the global gradient difference corresponding to the left half, the global gradient difference corresponding to the right half, the global gradient difference corresponding to the upper half, and the global gradient difference corresponding to the lower half.
- a _hor can be a preset value, which can be determined by referring to the method for obtaining the preset value a.
- DoSplit 1;//If the global gradient difference corresponding to the left half is a_hor times the global gradient difference corresponding to the right half, or the global gradient difference corresponding to the right half is a_hor times the global gradient difference corresponding to the left half, Then the division decision result is division.
- DoSplit 1;//If the global gradient difference corresponding to the upper half is a_hor times the global gradient corresponding to the lower half, or the global gradient difference corresponding to the lower half is a_hor times the global gradient difference corresponding to the upper half, then The division decision result is division.
- the second decision-making method for non-square blocks it is the second decision-making method for non-square blocks. It can be that if the first decision-making method for non-square blocks cannot be determined whether a certain division is certain, the second decision-making method for non-square blocks is used to make a decision to perform more refined calculations.
- the target coding unit may be divided into the smallest coding unit, for example, 4*4 pixels, and the local gradient calculation is performed on each sub-encoding respectively, and the local gradient calculation method is a pixel-by-pixel difference.
- subBlkNum refers to the number of sub coding units.
- Thr_Sub_Limit represents the gradient threshold, that is, it can traverse the gradient value of each sub-block in the target coding unit, and count the number of sub-blocks greater than Thr_Sub_Limit. If the gradient of the sub-block exceeding 1/4 is greater than Thr_Sub_Limit, then the block is divided The result of the decision is to divide.
- the target coding unit may be divided horizontally, vertically, and diagonally. Calculate the global pixel gradient of each area, and calculate the gradient difference of the global pixel gradient corresponding to the horizontal direction, vertical direction, 45 degree direction and 135 degree direction respectively according to formulas (5) to (8).
- the gradient calculation direction may include a horizontal direction, a vertical direction, a 45-degree direction, and a 135-degree direction.
- the local pixel gradients in the horizontal, vertical, 45-degree, and 135-degree directions can be calculated according to formulas (1) to (4).
- step 4 Determine whether each global pixel gradient difference is less than the corresponding first threshold, determine whether each local pixel gradient is less than the corresponding first threshold, and if so, determine that the target division decision result is the termination division. If not, go to step 5.
- the pixel gradient difference in the horizontal direction may include at least one of the pixel gradient data difference between the sub-encoding unit 0 and the sub-encoding unit 1, or the pixel gradient data difference between the sub-encoding unit 2 and the sub-encoding unit 3.
- the pixel gradient data difference in the vertical direction may include at least one of the pixel gradient data difference between the sub-encoding unit 0 and the sub-encoding unit 2 or the pixel gradient data difference between the sub-encoding unit 1 and the sub-encoding unit 3.
- step 6 When the pixel gradient data difference between the sub coding units meets the third threshold judgment condition, it is determined that the target division decision result corresponding to the target coding unit is division. If not, go to step 7.
- the target division decision result corresponding to the target encoding unit is division.
- the first rate-distortion cost corresponding to the target coding unit as a complete block for encoding, and the second rate-distortion cost corresponding to the target coding unit being divided into 4 sub-blocks for encoding can be calculated. If the first rate-distortion cost is large, the division decision The result is divided. If the second rate-distortion cost is high, the division decision result is to terminate division.
- the method in the embodiments of the present application may be used to make block division (coding unit division) decisions.
- a video encoding device may be integrated into the above-mentioned terminal device, and may specifically include a target encoding unit obtaining module 1102, a target pixel gradient data obtaining module 1104 , The target division decision result determination module 1106 and the video encoding module 1108.
- the target coding unit obtaining module 1102 is used to obtain the target coding unit to be coded.
- the target pixel gradient data obtaining module 1104 is configured to calculate the target pixel gradient data according to the pixel value of the pixel corresponding to the target encoding unit, and the target pixel gradient data is obtained according to the difference between the pixel value of the pixel and the reference pixel value.
- the target division decision result determination module 1106 is configured to determine the target division decision result corresponding to the target coding unit according to the target pixel gradient data.
- the video encoding module 1108 is configured to perform video encoding on the target coding unit according to the result of the target division decision.
- the target pixel gradient data obtaining module 1104 includes:
- the reference pixel value obtaining unit is used to determine the current pixel in the target coding unit, and obtain the pixel value of the target adjacent pixel corresponding to the current pixel as the reference pixel value.
- the pixel value difference calculation unit is used to calculate the difference between the pixel value of the current pixel and the reference pixel value to obtain the pixel value difference corresponding to the current pixel.
- the target pixel gradient data obtaining unit is used to perform statistics on the pixel value difference corresponding to each pixel in the target coding unit to obtain the target pixel gradient data.
- the unit for determining the target adjacent pixel corresponding to the current pixel is used to: obtain the target gradient calculation direction; obtain the adjacent pixel corresponding to the current pixel in the target gradient calculation direction as the target adjacent pixel point.
- the target pixel gradient data obtaining module 1104 includes:
- the pixel gradient calculation unit is used to divide the target coding unit into multiple regions and calculate the pixel gradient corresponding to each region.
- the pixel gradient corresponding to the region is obtained according to the difference between the pixel value of the pixel corresponding to the region and the reference pixel value.
- the gradient difference calculation unit is used to calculate the pixel gradient difference between regions to obtain target pixel gradient data.
- the gradient difference calculation unit is configured to: subtract the pixel gradient corresponding to the second area from the pixel gradient corresponding to the first area to obtain the pixel gradient difference between the first area and the second area.
- the target division decision result determination module 1106 is configured to: when the target pixel gradient data meets the first threshold condition, determine that the target division decision result corresponding to the target coding unit is the termination division, and the first threshold condition includes less than the corresponding The target pixel gradient data of the first threshold is greater than or equal to the first number.
- the target division decision result determining module 1106 is configured to: when the target pixel gradient data meets the second threshold condition, determine that the target division decision result corresponding to the target coding unit is a division, and the second threshold condition includes greater than the corresponding first threshold condition.
- the target pixel gradient data of the two thresholds is greater than or equal to the second number.
- the target pixel gradient data obtaining module 1104 includes:
- the dividing unit is used to divide the target coding unit to obtain multiple sub coding units.
- the sub-coding unit gradient data calculation unit is used to calculate the pixel gradient data corresponding to each sub-coding unit to obtain the target pixel gradient data.
- the pixel gradient corresponding to the sub-coding unit is based on the difference between the pixel value of the pixel corresponding to the sub-coding unit and the reference pixel value. The difference is obtained.
- the target division decision result determination module 1106 is configured to calculate the pixel gradient data difference between the sub-coding units, and determine the target division decision result corresponding to the target coding unit according to the pixel gradient data difference between the sub-coding units.
- the target division decision result determination module 1106 is configured to determine that the target division decision result corresponding to the target coding unit is the division when the pixel gradient data difference between the sub coding units meets the third threshold judgment condition, and the third The threshold judgment condition includes that a pixel gradient data difference greater than a corresponding third threshold is greater than or equal to a third amount.
- the target division decision result determination module 1106 is configured to: obtain the target number of sub coding units whose pixel gradient data is greater than the fourth threshold, when the target number exceeds the fourth number or when the target number accounts for the proportion of the number of sub coding units When the first ratio is exceeded, it is determined that the target division decision result corresponding to the target coding unit is division.
- the video encoding module 1108 is configured to: when the target division decision result is the termination division, use the target coding unit as the coding unit for video coding; when the target division decision result is the division, divide the target coding unit, Obtain multiple sub-coding units, and perform video encoding according to the sub-coding units.
- Fig. 12 shows an internal structure diagram of a computer device in some embodiments.
- the computer device may specifically be the terminal device in FIG. 1.
- the computer equipment includes a processor, a memory, a network interface, an input device, and a display screen connected through a system bus.
- the memory includes a non-volatile storage medium and an internal memory.
- the non-volatile storage medium of the computer device stores an operating system, and may also store computer-readable instructions.
- the processor can realize the video encoding method.
- Computer-readable instructions may also be stored in the internal memory, and when the computer-readable instructions are executed by the processor, the processor can execute the video encoding method.
- the display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen.
- the input device of the computer equipment can be a touch layer covered on the display screen, or it can be a button, trackball or touchpad set on the housing of the computer equipment. It can be an external keyboard, touchpad, or mouse.
- FIG. 12 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
- the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
- the video encoding apparatus may be implemented in a form of computer-readable instructions, and the computer-readable instructions may run on a computer device as shown in FIG. 12.
- the computer equipment can be a terminal or a server.
- the memory of the computer equipment can store various program modules that make up the video encoding device, for example, the target encoding unit acquisition module 1102, the target pixel gradient data acquisition module 1104, and the target division decision shown in FIG. 11
- the computer-readable instructions formed by each program module cause the processor to execute the steps in the video encoding method of each embodiment of the present application described in this specification.
- the computer device shown in FIG. 12 may obtain the target coding unit to be coded through the target coding unit obtaining module 1102 in the video coding apparatus shown in FIG. 11.
- the target pixel gradient data obtaining module 1104 calculates the target pixel gradient data according to the pixel value of the pixel corresponding to the target encoding unit, and the target pixel gradient data is obtained according to the difference between the pixel value of the pixel and the reference pixel value.
- the target division decision result determination module 1106 determines the target division decision result corresponding to the target coding unit according to the target pixel gradient data.
- the video encoding module 1108 performs video encoding on the target coding unit according to the result of the target division decision.
- a computer device including a memory and a processor.
- the memory stores computer-readable instructions.
- the processor executes the steps of the video encoding method described above.
- the steps of the video encoding method may be the steps in the video encoding method of each of the foregoing embodiments.
- a computer-readable storage medium which stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the processor executes the steps of the video encoding method described above.
- the steps of the video encoding method may be the steps in the video encoding method of each of the foregoing embodiments.
- a computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
- the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the steps in the foregoing method embodiments.
- Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
- Volatile memory may include random access memory (RAM) or external cache memory.
- RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
本申请涉及一种视频编码方法、装置、计算机设备和存储介质,所述方法包括:获取待编码的目标编码单元;根据所述目标编码单元对应的像素点的像素值计算得到目标像素梯度数据,所述目标像素梯度数据根据所述像素点的像素值与参考像素值的差异得到;根据所述目标像素梯度数据确定所述目标编码单元对应的目标划分决策结果;根据所述目标划分决策结果对所述目标编码单元进行视频编码。
Description
本申请要求于2019年10月22日提交中国专利局,申请号为201911005290.8,申请名称为“视频编码方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及视频领域,特别是涉及视频编码方法、装置、计算机设备和存储介质。
随着多媒体技术和网络技术的飞速发展和广泛应用,人们在日常生活和生产活动中大量使用视频信息。为了减少视频的传输数据量或者存储数据量,需要对视频进行编码。
进行视频编码时,通常把一帧图像划分成多个最大编码单元,再采用四叉树等划分方式结构对最大编码单元进行多次划分,得到子编码单元,然后对最大编码单元以及子编码单元的每种预测模式进行率失真代价运算,根据率失真代价确定编码单元的划分方式,但这种方法计算量大,导致视频编码效率低。
发明内容
根据本申请提供的各种实施例,提供一种视频编码方法、装置、计算机设备和存储介质。
一种视频编码方法,由计算机设备执行,所述方法包括:获取待编码的目标编码单元;根据所述目标编码单元对应的像素点的像素值计算得到目标像素梯度数据,所述目标像素梯度数据根据所述像素点的像素值与参考像素值的差异得到;根据所述目标像素梯度数据确定所述目标编码单元对应的目标划分决策结果;根据所述目标划分决策结果对所述目标编码单元进行视频编码。
一种视频编码装置,所述装置包括:目标编码单元获取模块,用于获取待编码的目标编码单元;目标像素梯度数据得到模块,用于根据所述目标编码单 元对应的像素点的像素值计算得到目标像素梯度数据,所述目标像素梯度数据根据所述像素点的像素值与参考像素值的差异得到;目标划分决策结果确定模块,用于根据所述目标像素梯度数据确定所述目标编码单元对应的目标划分决策结果;视频编码模块,用于根据所述目标划分决策结果对所述目标编码单元进行视频编码。
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行上述视频编码方法的步骤。
一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述处理器执行上述视频编码方法的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一些实施例中提供的通信系统的框图;
图2为一些实施例中视频编码器的框图;
图3为一些实施例中视频编码方法的流程图;
图4为一些实施例中图像块的划分示意图;
图5为一些实施例中根据目标编码单元对应的像素点的像素值计算得到目标像素梯度数据的流程图;
图6为一些实施例中获取当前像素点对应的目标相邻像素点的示意图;
图7为一些实施例中梯度计算方向的示意图;
图8为一些实施例中梯度计算方向的示意图;
图9为一些实施例中对目标编码单元进行划分的流程图;
图10为一些实施例中对目标编码单元进行划分得到子编码单元的示意图;
图11为一些实施例中视频编码装置的结构框图;以及
图12为一些实施例中计算机设备的内部结构框图。
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
可以理解,本申请所使用的术语“第一”、“第二”等可在本文中用于描述各种元件,但除非特别说明,这些元件不受这些术语限制。这些术语仅用于将第一个元件与另一个元件区分。举例来说,在不脱离本申请的范围的情况下,可以将第一区域称为第二区域,且类似地,可将第二区域称为第一区域。
图1是根据本申请公开的实施例的通信系统(100)的简化框图。通信系统(100)包括多个终端装置,所述终端装置可通过例如网络(150)彼此通信。举例来说,通信系统(100)包括通过网络(150)互连的第一终端装置(110)和第二终端装置(120)。在图1的实施例中,第一终端装置(110)和第二终端装置(120)执行单向数据传输。举例来说,第一终端装置(110)可对视频数据(例如由终端装置(110)采集的视频图片流)进行编码以通过网络(150)传输到第二端装置(120)。已编码的视频数据以一个或多个已编码视频码流形式传输。第二终端装置(120)可从网络(150)接收已编码视频数据,对已编码视频数据进行解码以恢复视频数据,并根据恢复的视频数据显示视频图片。单向数据传输在媒体服务等应用中是较常见的。
在另一实施例中,通信系统(100)包括执行已编码视频数据的双向传输的第三终端装置(130)和第四终端装置(140),所述双向传输可例如在视频会议期间发生。对于双向数据传输,第三终端装置(130)和第四终端装置(140)中的每个终端装置可对视频数据(例如由终端装置采集的视频图片流)进行编码,以通过网络(150)传输到第三终端装置(130)和第四终端装置(140)中 的另一终端装置。第三终端装置(130)和第四终端装置(140)中的每个终端装置还可接收由第三终端装置(130)和第四终端装置(140)中的另一终端装置传输的已编码视频数据,且可对所述已编码视频数据进行解码以恢复视频数据,且可根据恢复的视频数据在可访问的显示装置上显示视频图片。
在图1的实施例中,第一终端装置(110)、第二终端装置(120)、第三终端装置(130)和第四终端装置(140)可为服务器、个人计算机和智能电话,但本申请公开的原理可不限于此。本申请公开的实施例适用于膝上型计算机、平板电脑、媒体播放器和/或专用视频会议设备。网络(150)表示在第一终端装置(110)、第二终端装置(120)、第三终端装置(130)和第四终端装置(140)之间传送已编码视频数据的任何数目的网络,包括例如有线(连线的)和/或无线通信网络。通信网络(150)可在电路交换和/或分组交换信道中交换数据。该网络可包括电信网络、局域网、广域网和/或互联网。出于本申请的目的,除非在下文中有所解释,否则网络(150)的架构和拓扑对于本申请公开的操作来说可能是无关紧要的。
图2是根据本申请公开的实施例的视频编码器(203)的框图。视频编码器(203)设置于电子装置(220)中。电子装置(220)包括传输器(240)(例如传输电路)。
视频编码器(203)可从视频源(201)(并非图2实施例中的电子装置(220)的一部分)接收视频样本,所述视频源可采集将由视频编码器(203)编码的视频图像。在另一实施例中,视频源(201)是电子装置(220)的一部分。
视频源(201)可提供将由视频编码器(203)编码的呈数字视频样本流形式的源视频序列,所述数字视频样本流可具有任何合适位深度(例如:8位、10位、12位……)、任何色彩空间(例如BT.601 Y CrCB、RGB……)和任何合适取样结构(例如Y CrCb 4:2:0、Y CrCb 4:4:4)。在媒体服务系统中,视频源(201)可以是存储先前已准备的视频的存储装置。在视频会议系统中,视频源(201)可以是采集本地图像信息作为视频序列的相机。可将视频数据提供为多个单独的图片,当按顺序观看时,这些图片被赋予运动。图片自身可构建为空间像素阵列,其中取决于所用的取样结构、色彩空间等,每个像素可包括一个 或多个样本。所属领域的技术人员可以很容易理解像素与样本之间的关系。下文侧重于描述样本。
根据实施例,视频编码器(203)可实时或在由应用所要求的任何其它时间约束下,将源视频序列的图片编码且压缩成已编码视频序列(243)。施行适当的编码速度是控制器(250)的一个功能。在一些实施例中,控制器(250)控制如下文所描述的其它功能单元且在功能上耦接到这些单元。为了简洁起见,图中未标示耦接。由控制器(250)设置的参数可包括速率控制相关参数(图片跳过、量化器、率失真优化技术的λ值等)、图片大小、图片群组(group of pictures,GOP)布局,最大运动矢量搜索范围等。控制器(250)可用于具有其它合适的功能,这些功能涉及针对某一系统设计优化的视频编码器(203)。
在一些实施例中,视频编码器(203)在编码环路中进行操作。作为简单的描述,在实施例中,编码环路可包括源编码器(230)(例如,负责基于待编码的输入图片和参考图片创建符号,例如符号流)和嵌入于视频编码器(203)中的(本地)解码器(233)。解码器(233)以类似于(远程)解码器创建样本数据的方式重建符号以创建样本数据(因为在本申请所考虑的视频压缩技术中,符号与已编码视频码流之间的任何压缩是无损的)。将重建的样本流(样本数据)输入到参考图片存储器(234)。由于符号流的解码产生与解码器位置(本地或远程)无关的位精确结果,因此参考图片存储器(234)中的内容在本地编码器与远程编码器之间也是按比特位精确对应的。换句话说,编码器的预测部分“看到”的参考图片样本与解码器将在解码期间使用预测时所“看到”的样本值完全相同。这种参考图片同步性基本原理(以及在例如因信道误差而无法维持同步性的情况下产生的漂移)也用于一些相关技术。
“本地”解码器(233)的操作可与可与“远程”解码器相同,即在解码端的解码器相同。
在操作期间,在一些实施例中,源编码器(230)可执行运动补偿预测编码。参考来自视频序列中被指定为“参考图片”的一个或多个先前已编码图片,所述运动补偿预测编码对输入图片进行预测性编码。以此方式,编码引擎(232)对输入图片的像素块与参考图片的像素块之间的差异进行编码,所述参考图片 可被选作所述输入图片的预测参考。
本地视频解码器(233)可基于源编码器(230)创建的符号,对可指定为参考图片的已编码视频数据进行解码。编码引擎(232)的操作可为有损过程。当已编码视频数据可在视频解码器(图2中未示)处被解码时,重建的视频序列通常可以是带有一些误差的源视频序列的副本。本地视频解码器(233)复制解码过程,所述解码过程可由视频解码器对参考图片执行,且可使重建的参考图片存储在参考图片高速缓存(234)中。以此方式,视频编码器(203)可在本地存储重建的参考图片的副本,所述副本与将由远端视频解码器获得的重建参考图片具有共同内容(不存在传输误差)。
预测器(235)可针对编码引擎(232)执行预测搜索。即,对于将要编码的新图片,预测器(235)可在参考图片存储器(234)中搜索可作为所述新图片的适当预测参考的样本数据(作为候选参考像素块)或某些元数据,例如参考图片运动矢量、块形状等。预测器(235)可基于样本块逐像素块操作,以找到合适的预测参考。在一些情况下,根据预测器(235)获得的搜索结果,可确定输入图片可具有从参考图片存储器(234)中存储的多个参考图片取得的预测参考。
控制器(250)可管理源编码器(230)的编码操作,包括例如设置用于对视频数据进行编码的参数和子群参数。
可在熵编码器(245)中对所有上述功能单元的输出进行熵编码。熵编码器(245)根据例如霍夫曼编码、可变长度编码、算术编码等技术对各种功能单元生成的符号进行无损压缩,从而将所述符号转换成已编码视频序列。
传输器(240)可缓冲由熵编码器(245)创建的已编码视频序列,从而为通过通信信道(260)进行传输做准备,所述通信信道可以是通向将存储已编码的视频数据的存储装置的硬件/软件链路。传输器(240)可将来自视频编码器(203)的已编码视频数据与要传输的其它数据合并,所述其它数据例如是已编码音频数据和/或辅助数据流(未示出来源)。
控制器(250)可管理视频编码器(203)的操作。在编码期间,控制器(250)可以为每个已编码图片分配某一已编码图片类型,但这可能影响可应用于相应 的图片的编码技术。例如,通常可将图片分配为以下任一种图片类型:
帧内图片(I图片),其可以是不将序列中的任何其它图片用作预测源就可被编码和解码的图片。一些视频编解码器容许不同类型的帧内图片,包括例如独立解码器刷新(Independent Decoder Refresh,“IDR”)图片。所属领域的技术人员了解I图片的变体及其相应的应用和特征。
预测性图片(P图片),其可以是可使用帧内预测或帧间预测进行编码和解码的图片,所述帧内预测或帧间预测使用至多一个运动矢量和参考索引来预测每个块的样本值。
双向预测性图片(B图片),其可以是可使用帧内预测或帧间预测进行编码和解码的图片,所述帧内预测或帧间预测使用至多两个运动矢量和参考索引来预测每个块的样本值。类似地,多个预测性图片可使用多于两个参考图片和相关联元数据以用于重建单个块。
源图片通常可在空间上细分成多个样本块(例如,4×4、8×8、4×8或16×16个样本的块),且逐块进行编码。这些块可参考其它(已编码)块进行预测编码,根据应用于块的相应图片的编码分配来确定所述其它块。举例来说,I图片的块可进行非预测编码,或所述块可参考同一图片的已经编码的块来进行预测编码(空间预测或帧内预测)。P图片的像素块可参考一个先前编码的参考图片通过空间预测或通过时域预测进行预测编码。B图片的块可参考一个或两个先前编码的参考图片通过空间预测或通过时域预测进行预测编码。
视频编码器(203)可根据例如ITU-T H.265建议书的预定视频编码技术或标准执行编码操作。在操作中,视频编码器(203)可执行各种压缩操作,包括利用输入视频序列中的时间和空间冗余的预测编码操作。因此,已编码视频数据可符合所用视频编码技术或标准指定的语法。
在实施例中,传输器(240)可在传输已编码的视频时传输附加数据。源编码器(530)可将此类数据作为已编码视频序列的一部分。附加数据可包括时间/空间/SNR增强层、冗余图片和切片等其它形式的冗余数据、SEI消息、VUI参数集片段等。
采集到的视频可作为呈时间序列的多个源图片(视频图片)。帧内图片预 测(常常简化为帧内预测)利用给定图片中的空间相关性,而帧间图片预测则利用图片之间的(时间或其它)相关性。在实施例中,将正在编码/解码的特定图片分割成块,正在编码/解码的特定图片被称作当前图片。在当前图片中的块类似于视频中先前已编码且仍被缓冲的参考图片中的参考块时,可通过称作运动矢量的矢量对当前图片中的块进行编码。所述运动矢量指向参考图片中的参考块,且在使用多个参考图片的情况下,所述运动矢量可具有识别参考图片的第三维度。
在一些实施例中,双向预测技术可用于帧间图片预测中。根据双向预测技术,使用两个参考图片,例如按解码次序都在视频中的当前图片之前(但按显示次序可能分别是过去和将来)第一参考图片和第二参考图片。可通过指向第一参考图片中的第一参考块的第一运动矢量和指向第二参考图片中的第二参考块的第二运动矢量对当前图片中的块进行编码。具体来说,可通过第一参考块和第二参考块的组合来预测所述块。
此外,合并模式技术可用于帧间图片预测中以改善编码效率。
根据本申请公开的一些实施例,帧间图片预测和帧内图片预测等预测的执行以块为单位。举例来说,根据HEVC标准,将视频图片序列中的图片分割成编码树单元(coding tree unit,CTU)以用于压缩,图片中的CTU具有相同大小,例如64×64像素、32×32像素或16×16像素。一般来说,CTU包括三个编码树块(coding tree block,CTB),所述三个编码树块是一个亮度CTB和两个色度CTB。更进一步的,还可将每个CTU以四叉树拆分为一个或多个编码单元(coding unit,CU)。举例来说,可将64×64像素的CTU拆分为一个64×64像素的CU,或4个32×32像素的CU,或16个16×16像素的CU。在实施例中,分析每个CU以确定用于CU的预测类型,例如帧间预测类型或帧内预测类型。此外,取决于时间和/或空间可预测性,将CU拆分为一个或多个预测单元(prediction unit,PU)。通常,每个PU包括亮度预测块(prediction block,PB)和两个色度PB。在实施例中,编码(编码/解码)中的预测操作以预测块为单位来执行。以亮度预测块作为预测块为例,预测块包括像素值(例如,亮度值)的矩阵,例如8×8像素、16×16像素、8×16像素、16×8像素等等。
如图3所示,在一些实施例中,提出了一种视频编码方法,本实施例主要以该方法应用于上述图1中的终端装置来举例说明。具体可以包括以下步骤:
步骤S302,获取待编码的目标编码单元。
具体地,目标编码单元是要确定是否需要进行划分的编码单元。在进行视频编码时,可以获取视频帧,将视频帧图像以LCU((Largest Coding Unit),最大编码单元)为单位进行划分,即可以将视频帧图像划分为多个预设大小的最大编码单元(LCU),然后递归地判断编码单元是否划能分成小的编码单元(Coding Unit,CU),直至编码单元被划分为最小编码单元,从而形成编码单元树(Coding Unit Tree,CTU)结构。目标编码单元可以是LCU,也可以是对LCU进行划分得到的编码单元。例如,当获取得到一个视频图像帧时,假设LCU为64*64像素,终端装置可以将该视频图像帧分为多个64*64像素的LCU,并可以将LCU作为目标编码单元。当对LCU进行划分,得到4个编码单元(32*32像素)后,则可以将这4个划分得到的编码单元作为目标编码单元。
当然,上述仅以LCU划分方式为例,以说明本申请方案的某种实现方式,但是并不以上述划分方式为限。
步骤S304,根据目标编码单元对应的像素点的像素值计算得到目标像素梯度数据,目标像素梯度数据根据像素点的像素值与参考像素值的差异得到。
具体地,像素梯度反映了图像中像素值的变化。像素梯度越大,则表示像素变化越大,表示编码单元的图像内容变化越大;像素梯度越小,则表示像素变化越小,表示编码单元的图像内容变化越小。像素梯度可以用像素点的像素值与参考像素值的差异得到,差异越大,则梯度越大。差异可以用差值表示,也可以用比值表示,例如,可以将像素点的像素值减去参考像素值,将差值的绝对值作为像素差异。也可以将像素点的像素值除以参考像素值,将得到的比值作为像素差异。
像素梯度数据是与像素梯度有关的数据,例如可以包括像素梯度本身或者像素梯度之间的差异中的至少一个。参考像素值是确定像素梯度所参考的像素值。可以根据目标编码单元的像素值得到参考像素值,例如可以是从目标编码单元中任选一个像素点的像素值作为参考像素值,也可以是对目标编码单元的 像素值进行统计,计算得到像素均值,将计算得到的像素均值作为参考像素值。
在一些实施例中,可以计算各个像素点的像素值与参考像素值的差异,即各个像素点对应的像素梯度,综合各个像素点的像素梯度计算得到目标梯度数据,例如将各个像素点的像素梯度进行累加,得到目标像素梯度数据。
在一些实施例中,目标编码单元中,不同像素点对应的参考像素点可以是不同的,可以将像素点对应的参考像素点的像素值作为参考像素值。例如,可以将像素点的相邻像素点的像素值作为参考像素值。终端装置可以是将其中的一个相邻像素点的像素值作为参考像素值,也可以是综合多个相邻像素点的像素值作为参考像素值,例如将相邻像素点的像素均值作为参考像素值。
在一些实施例中,目标像素梯度数据可以为一个或多个。例如,目标像素梯度数据可以包括全局梯度数据或者局部梯度数据中的至少一个。全局梯度数据是根据像素点的像素值与像素平均值的差异得到的,即参考像素值可以是像素平均值,以反映目标编码单元在整体上的像素变化。局部梯度数据是根据像素点的像素值与目标编码单元中其他像素点像素值的差异得到的,即在计算一个像素点对应的像素梯度时,可以计算该像素点的像素值与目标编码单元中其他像素点的像素值的差异,得到该像素点对应的像素梯度,其中,作为参考像素点的其他像素点可以是与该像素点相邻的像素点。
步骤S306,根据目标像素梯度数据确定目标编码单元对应的目标划分决策结果。
具体地,目标划分决策结果可以是终止划分或者划分。即候选的划分决策结果可以包括终止划分以及进行划分。终止划分是指不对目标编码单元进行划分。划分是指需要将目标编码单元划分为多个子编码单元。
在一些实施例中,目标划分决策结果还可以是不能确定,如果是不能确定,则可以根据传统的确定编码单元的划分方式的方法确定是否需要对目标编码单元进行划分。例如,可以计算将目标编码进行划分后得到的子编码单元对应的率失真代价,统计子编码单元对应的率失真代价之和,与目标编码单元不进行划分对应的率失真代价进行对比,如果是子编码单元对应的率失真代价之和小,则需要对目标编码单元进行划分。如果是目标编码单元不进行划分对应的率失 真代价小,则不需要对目标编码单元进行划分。
在根据目标像素梯度数据确定目标编码单元对应的目标划分决策结果时,可以将目标像素梯度数据与预设的阈值进行对比,如果是小于预设的第一梯度阈值,说明梯度差异或者梯度比较小,目标编码单元的内容变化小,则划分决策结果可以是终止划分。如果是大于预设的第二梯度阈值,则说明梯度差异或者梯度比较大,则划分决策结果可以是划分。第二梯度阈值可以大于等于第一梯度阈值。
在一些实施例中,当目标像素梯度数据有多个时,可以是所有的目标梯度数据或者超过预设数量的目标梯度数据小于预设的第一梯度阈值,则划分决策结果是终止划分。也可以是所有的目标梯度数据或者超过预设数量的目标梯度数据大于预设的第二梯度阈值,则划分决策结果是划分。
步骤S308,根据目标划分决策结果对目标编码单元进行视频编码。
具体地,得到目标划分决策结果后,可以根据目标划分决策结果对目标编码单元进行视频编码。
在一些实施例中,当目标划分决策结果为终止划分时,以目标编码单元为编码单位进行视频编码。由于当目标编码单元对应的划分结果为终止划分时,目标编码单元之后的划分过程将直接终止,不会再采用根据率失真代价的方式确定目标编码单元是否需要划分,因此可以节省编码时间。在确定终止划分时,终端装置可以将目标编码单元作为获取参考块(参考单元)的单位,获取该目标编码单元对应的参考块,根据参考块得到目标编码按单元的预测像素值,将目标编码单元真实的像素值减去预测像素值,得到预测残差,对预测残差进行量化以及熵编码等处理,得到编码数据,将编码数据存储在本地或者传输到另一个终端装置中。
在一些实施例中,当目标划分决策结果为划分时,终端装置对目标编码单元进行划分,得到多个子编码单元,根据子编码单元进行视频编码。
具体地,当目标划分决策结果为划分时,那么对目标编码单元进行帧内预测,以确定率失真代价的过程可以省略,可以直接对目标编码单元进行划分,因此可以节省目标编码单元的率失真优化计算过程。划分的方式可以根据需要 设置,例如可以是进行四叉树划分、三叉树分化或者二叉树分发。M叉树划分是指将目标编码单元划分为M份。M可以是大于等于2的整数。对于划分得到的子编码单元,如果子编码单元是视频编码标准中规定的最小编码单元,则可以不继续进行划分。如果子编码单元不是视频编码标准中规定的最小编码单元,则可以将子编码单元作为新的目标编码单元,继续进行划分决策。
上述视频编码方法,可通过编码单元的像素值计算得到目标像素梯度数据,目标像素梯度数据根据像素点的像素值与参考像素值的差异得到,因此可以反映目标编码单元的图像变化情况,故根据目标像素梯度数据确定目标编码单元对应的目标划分决策结果的准确度高,且能够减少得到划分决策结果的时间,因此缩短了视频编码的编码时间,提高了视频编码效率。
例如,图4所示为一个64*64像素的图像块的划分示意图。一个方块代表一个编码块。首先可以将64*64像素的图像块作为目标编码单元,执行本申请实施例提供的方法,得到目标编码按单元的划分决策结果为划分。因此可以将64*64像素图像块划分为4个子编码单元。并将这4个子编码单元作为新的目标编码单元,继续执行本申请实施例提供的方法,直至划分得到4*4像素的编码单元,由于4*4像素的编码单元是该视频编码标准的最小编码单元,因此可以停止进行划分决策。
在一些实施例中,如图5所示,步骤S302即根据目标编码单元对应的像素点的像素值计算得到目标像素梯度数据包括以下步骤:
步骤S502,确定目标编码单元中的当前像素点,获取当前像素点对应的目标相邻像素点的像素值,作为参考像素值。
具体地,当前像素点是指目标编码单元中当前需要计算像素梯度的像素点。可以将每一个像素点在同一时刻都作为当前像素点,也可以是逐一或者将多个像素点作为当前像素点。目标相邻像素点可以是与当前像素点相邻的像素点中的一个或多个,例如,如图6所示,一个小方格表示一个像素点.假设目标编码单元中,当前像素点为X1,则目标相邻像素点可以是A至H的所有像素点,也可以是选取A至H中的其中一个或多个像素点作为目标相邻像素点。
在一些实施例中,可以获取目标梯度计算方向,获取目标梯度计算方向上, 当前像素点对应的相邻像素点,作为目标相邻像素点。
具体地,目标梯度计算方向是指梯度计算的方向,梯度计算方向可以根据需要确定,例如可以包括水平方向、垂直方向、45度方向或者135度方向中的至少一个。水平方向可以是即从左到右或者从右到左的方向,垂直方向是指从上到小或者从下到上的方向。45度方向是指从目标编码单元的左下角到右上角的方向,135度方向是指从目标编码单元的右下角到左上角的方向,如图7所示,为一些实施例中各个梯度计算方向的示意图。
目标梯度计算方向上,当前像素点对应的相邻像素点是该目标梯度计算方向上,当前像素点的前向相邻像素点,前向相邻像素点是指在目标梯度计算方向上,在当前像素点之前的相邻像素点。例如,假设目标梯度计算方向为图7中的水平方向,则对于当前像素点X1,目标相邻像素点为H。假设目标梯度计算方向为图7中的垂直方向,则对于当前像素点X1,目标相邻像素点为B。假设目标梯度计算方向为图7中的45度方向,则对于当前像素点X1,目标相邻像素点为G。假设目标梯度计算方向为图7中的135度方向,则对于当前像素点X1,目标相邻像素点为E。
步骤S504,计算当前像素点的像素值与参考像素值的差异,得到当前像素点对应的像素值差异。
具体地,当前像素点的像素值与参考像素值的差异可以根据差值得到。例如可以是差值的绝对值,也可以是差值的平方。
步骤S506,对目标编码单元中各个像素点对应的像素值差异进行统计,得到目标像素梯度数据。
具体地,统计可以是求均值或者求中位数。例如对目标编码单元中各个像素点对应的像素值差异进行求和计算,将得到的和除以像素点的个数,得到目标像素梯度数据。
在一些实施例中,目标像素梯度数据可以包括一个或多个目标梯度计算方向对应的局部像素梯度。例如包括水平方向、垂直方向、45度方向以及135度方向分别对应的局部像素梯度。水平方向、垂直方向、45度方向或者135度分别对应的局部像素梯度可以分别用公式(1)至(4)表示。其中,公式中, LocalGradient表示局部像素梯度,LocalGradient_HOR表示水平方向的局部像素梯度。LocalGradient_VER表示垂直方向的局部像素梯度。LocalGradient_45表示45度方向的局部像素梯度。LocalGradient_135表示135度方向的局部像素梯度。Width表示宽度方向即水平方向上的像素个数,Height表示高度方向即垂直方向上的像素个数。P表示像素值,例如P
i,j表示目标编码单元中,第i行第j列的像素值,abs表示求绝对值,∑表示求和。
在一些实施例中步骤S302即根据目标编码单元对应的像素点的像素值计算得到目标像素梯度数据包括以下步骤:将目标编码单元划分为多个区域,计算各个区域对应的像素梯度,区域对应的像素梯度根据区域对应的像素点的像素值与参考像素值的差异得到;计算区域之间的像素梯度差异,得到目标像素梯度数据。
具体地,多个是指两个以上,包括两个。将目标编码划分为多个区域的方法可以根据需要设置,例如可以将目标编码区域划分为两个区域。划分为两个区域的方法可以是水平划分、垂直划分或者对角划分中的至少一个。例如如果目标编码单元是方形块,即长度方向上以及宽度方向上的像素点个数相同时,则可以进行水平划分、垂直划分以及对角划分。而如果目标编码单元是非方形块,则如果宽大于高,则可以进行垂直划分,将目标编码单元划分为左半部分以及右半部分。如果是高大于宽,则可以是进行水平划分,将目标编码单元划分为上半部分以及下半部分。
参考像素值可以是像素均值,像素均值可以是目标编码单元对应的像素点的像素均值,也可以是该区域对应的像素点的像素均值。当然也可以是与像素 点相邻的相邻像素点的像素值。区域之间的像素梯度差异可以是用两个区域之间的像素梯度差值的绝对值、差值的平方或者比例值表示。区域对应的像素梯度可以是该区域中各个像素点对应的像素差异的统计值,例如该区域中各个像素点对应的像素差异的和或者平均值。
在一些实施例中,计算区域之间的像素梯度差异包括:将第一区域对应的像素梯度减去第二区域对应的像素梯度,得到第一区域与第二区域之间的像素梯度差异。第一区域以及第二区域可以是目标编码单元对应的任一个区域,例如第一区域可以是目标编码单元的上半部分,第二区域可以是目标编码单元的下半部分。
在一些实施例中,梯度是有方向的,例如可以包括水平方向的梯度以及垂直方向的梯度。同一个编码单元,不同方向对应的梯度表示的意义不同。因此在进行梯度比较,例如梯度相减时,是相同方向的像素梯度进行相减。例如,第一区域中垂直方向的像素梯度与第二区域中垂直方向的像素梯度相减。
在一些实施例中,目标像素梯度数据可以包括一个或多个梯度计算方向对应的全局像素梯度的梯度差异。例如包括水平方向、垂直方向、45度方向以及135度方向分别对应的全局像素梯度的梯度差异。如图8所示,水平方向的全局像素梯度差异可以是左半部分与右半部分之间的全局像素梯度的差异。垂直方向的全局像素梯度可以是上半部分与下半部分之间的全局像素梯度差异。45度方向的全局像素梯度可以是左下部分与右上部分之间的全局像素梯度差异。135度方向的全局像素梯度可以是左上部分与右下部分之间的全局像素梯度差异。
在一些实施中,水平方向、垂直方向、45度方向以及135度分别对应的全局像素梯度差异可以分别用公式(5)至(8)表示。其中,公式中,GlobalGradient表示全局像素梯度,GlobalGradient_HOR表示水平方向的全局像素梯度差异。GlobalGradient_VER表示垂直方向的全局像素梯度差异。GlobalGradient_45表示45度方向的全局像素梯度差异。GlobalGradient_135表示135度方向的全部像素梯度差异。N表示长度方向以及宽度方向的像素个数,这里是以目标编码单元的长度方向与宽度方向的像素点个数相同为例进行说明的,可以理解,目标编码单元的长度方向与宽度方向的像素点个数也可以不相同。abs表示求绝对 值,∑表示求和。P表示像素值,例如P
i,j表示目标编码单元中,第i行第j列的像素值。P
avg表示像素平均值。
在一些实施例中,可以判断目标像素梯度数据是否满足第一阈值条件,当当目标像素梯度数据满足第一阈值条件时,确定目标编码单元对应的目标划分决策结果为终止划分。其中,第一阈值条件包括小于对应的第一阈值的目标像素梯度数据满足第一数量。
具体地,满足第一数量是指大于或者等于第一数量,第一数量可以根据需要设置,例如可以是全部的目标像素梯度数据对应的数量,也可以是部分,第一数量也可以是预设的值,例如5。如果小于对应的第一阈值的目标像素梯度数据大于或者等于第一数量,则说明目标编码单元的图像内容变化较小,满足不进行划分的条件,因此可以终止划分。
在一些实施例,第一阈值可以根据经验或者实验得到。例如,可以统计不同内容的视频帧中,进行划分以及终止划分时,分别对应的目标梯度数据,然后选择使得终止划分的决策的正确率满足预设正确率例如90%的值作为第一阈值。
在一些实施例中,不同的量化参数可以对应的不同的第一阈值,不同尺寸的目标编码单元也可以对应不同的第一阈值。例如可以通过实验统计得到不同的量化参数的情况下,使得终止划分的决策的正确率满足预设正确率的梯度值作为第一阈值。
在一些实施例中,不同的目标像素梯度数据可以对应不同的第一阈值,例 如,根据实验,45度方向以及135度方向的目标像素梯度数据会比水平方向以及垂直方向的目标像素梯度数据大,因此,可以为45度方向以及135度方向的目标像素梯度数据设置较大的第一阈值,即45度方向以及135度方向的目标像素梯度数据对应的第一阈值,大于水平方向以及垂直方向的目标像素梯度数据对应的第一阈值,例如,45度方向以及135度方向的目标像素梯度数据对应的第一阈值可以是水平方向的目标像素梯度数据对应的第一阈值的a倍,a大于1,例如可以为1.414。又例如,全局梯度对应的第一阈值与局部梯度对应的第一阈值可以不同。
在一些实施例中,第一阈值可以是变化的,例如对于视频帧的第一个CTU,其对应的第一阈值可以是预设的。而对于其他存在相邻的CTU的CTU,则可以获取相邻的CTU对应的目标梯度数据,根据相邻的CTU对应的目标梯度数据得到第一阈值。例如,可以将该CTU上边以及左边的CTU中,最大的梯度数据作为第一阈值。也可以将上边以及左边的CTU的平均梯度数据作为第一阈值。由于相邻的CTU的相似度比较高,利用相邻的CTU对应的目标梯度数据更新第一阈值,可以使得划分决策结果的判断条件根据图像内容的变化而动态变化,提高了划分决策结果的准确度。
在一些实施例中,根据目标像素梯度数据确定目标编码单元对应的目标划分决策结果包括:当目标像素梯度数据满足第二阈值条件时,确定目标编码单元对应的目标划分决策结果为划分,第二阈值条件包括大于对应的第二阈值的目标像素梯度数据满足第二数量。
具体地,满足第二数量是指大于或者等于第二数量,第一阈值和第二阈值可以相同也可以不同。第二数量可以根据需要设置,例如可以是全部的目标像素梯度数据对应的数量,也可以是部分,也可以是预设的值。如果大于对应的第二阈值的目标像素梯度数据满足第二数量,则说明目标编码单元的图像内容变化较大,满足进行划分的条件,因此可以对目标编码单元进行划分。
在一些实施例中,可以对目标像素梯度数据是否满足第一阈值条件以及第二阈值条件同时进行判断。也可以是先判断目标像素梯度数据是否满足第一阈值条件,当目标像素梯度数据满足第一阈值条件时,则不对目标像素梯度数据 是否满足第二阈值条件进行判断,以减少计算复杂度。
在一些实施例中,终端装置可以针对不同的情况确定是采用第一阈值条件还是采用第二阈值条件确定目标划分决策结果。例如,可以根据对目标编码单元进行划分时,得到的子单元个数确定阈值条件以及对应的划分决策结果。例如,如果是4叉树(quad tree,QT)划分,则可以判断目标像素梯度数据是否满足第一阈值条件,例如确定是否是所有的目标像素梯度数据都小于第一阈值,如果是,则确定目标编码单元对应的目标划分决策结果为终止划分。而如果是3叉树(Triple Tree,TT)以及2叉树(Binary Tree,BT)划分,由于划分的子单元比较少,则条件可以放宽,故可以判断是否是所有的像素梯度数据都大于第二阈值,如果是,则确定目标编码单元对应的目标划分决策结果为进行划分。
在一些实施例中,编码单元的划分方式可以根据需要设置,例如可以设置编码单元均为进行4叉树划分,也可以设置多种划分方式。例如,可以在视频标准VVC(Versatile Video Coding,多功能视频编码)中,对编码单元的划分采用更加复杂和多样的二叉树、三叉树和四叉树划分方式,以使得编码的数据量更小。编码块(即编码单元)最大可以为128x128像素,最小可以为4x4像素。编码块可以先进行4叉树划分,在4叉树划分的叶子节点上再进行2叉树和3叉树的划分,4叉树叶子节点的最小值为16x16像素。如图9所示,对于每一个CTU内的CU块可以做图中所示流程的判断,先进行四叉树划分递归过程,再对四叉树划分的叶子节点块依次遍历垂直以及水平划分,二叉树以及三叉树划分,直至找到当前CTU内所有CU块的最优划分结果。因此,可以在QT(4叉树)划分、TT(3叉树)划分以及BT(2叉树)划分的过程中,采用本申请实施例提供的方法,进行划分决策结果的快速确定,如果采用本申请实施例提供的方法,不能确定是否需要进行划分,则可以采用其他的划分深度决策方法确定划分深度。其中,MT(Multiple-type Tree,多叉树划分)表示多叉树划分,例如4叉树、2叉树或者3叉树。MTdepth表示MT划分的深度,QTdepth表示QT划分的深度,每多划分一次,则深度加1。
在一些实施例中,步骤S302即根据目标编码单元对应的像素点的像素值计算得到目标像素梯度数据包括:对目标编码单元进行划分,得到多个子编码 单元;计算各个子编码单元对应的像素梯度,得到目标像素梯度数据,子编码单元对应的像素梯度根据子编码单元对应的像素点的像素值与参考像素值的差异得到。
具体地,划分方式可以根据需要设置,例如可以是划分为4个,也可以是划分为3个。例如,划分方式可以根据当前所在的划分流程确定。如果当前是对目标编码单元是否进行3叉树划分进行判断,则可以将目标编码单元划分为3个子编码单元。
在一些实施例中,划分后的子编码单元是满足视频编码标准中编码单元的大小要求的或者比最小编码单元小。例如,如果目标编码单元是非方形块,如果划分为2个编码单元时,往往不符合视频编码标准中对编码单元大小的要求,则可以按照最小编码单元的大小将目标编码单元划分为多个子块。举个实际例子,假设目标编码单元为16*8像素,最小编码单元为4*4像素,可以按照4*4像素的大小将编码单元划分为8个子编码单元。子编码单元的像素梯度可以包括全局梯度或者局部梯度中的至少一个。可以将子编码单元的像素梯度数据作为目标像素梯度数据。
在一些实施例中,根据目标像素梯度数据确定目标编码单元对应的目标划分决策结果包括:计算子编码单元之间的像素梯度数据差异,根据子编码单元之间的像素梯度数据差异确定目标编码单元对应的目标划分决策结果。
具体地,梯度数据差异可以根据比值或者差值得到,例如可以是差值的绝对值,也可以是差值的平方。终端装置可以是计算每两个子编码单元之间的像素梯度数据的差异。也可以是计算水平方向或者垂直方向的像素梯度数据差异的至少一个。水平方向的像素梯度数据差异可以是左边子块(子编码单元)与相邻的右边子块的像素梯度数据的差异。垂直方向的水平方向的像素梯度数据差异可以是上边子块与相邻的下边子块的像素梯度数据的差异。例如,如图10所示,假设将目标编码单元分为4个子编码单元,分别为0、1、2以及3。则水平方向的像素梯度差异可以包括子编码单元0与子编码单元1的像素梯度数据差异、或者子编码单元2与子编码单元3的像素梯度数据差异的至少一个。垂直方向的像素梯度数据差异可以包括子编码单元0与子编码单元2的像素梯度 数据差异、或者子编码单元1与子编码单元3的像素梯度数据差异的至少一个。
子编码单元之间的像素梯度数据差异可以用差值也可以是用比值表示。例如,假设子编码单元1的像素梯度为x,子编码单元3的像素梯度为y,则像素梯度差异可以为x除以y,或者y除以x。
在一些实施例中,可以是当子编码单元之间的像素梯度数据差异满足第三阈值判断条件时,确定目标编码单元对应的目标划分决策结果为划分,第三阈值判断条件包括大于对应的第三阈值的像素梯度数据差异满足第三数量。
具体地,满足第三数量是指大于或者等于第三数量,第三阈值可以根据需要设置,例如可以是根据经验或者实验得到。例如,可以统计不同内容的视频帧中,进行划分以及终止划分时,分别对应的目标梯度数据差异,然后选择使得划分决策结果为划分的正确率满足预设正确率例如98%的梯度数据差异值作为第三阈值。第三阈值也可以根据编码块的长度和宽度,以及长度和宽度的比例进行自适应的调整。例如,统计不同内容的视频帧中,进行划分以及终止划分时,分别对应的目标梯度数据差异时,可以区分不同的编码块大小进行统计,以得到各种不同尺寸的编码单元对应的阈值。
第三数量可以根据需要设置,例如可以是全部的像素梯度数据差异对应的数量,也可以是部分例如只要是其中的一个像素梯度数据差异大于第三阈值时,则目标编码单元对应的目标划分决策结果为划分,由于当子编码单元的像素梯度数据差异大时,说明各个子编码单元的内容相差比较大,因此需要各自获取对应的参考块进行预测,得到子编码单元的预测值,以使得预测残差更小,因此,划分决策结果可以是进行划分,以减少编码得到的编码数据的数据量。
在一些实施例中,根据目标像素梯度数据确定目标编码单元对应的目标划分决策结果包括:获取像素梯度数据大于第四阈值的子编码单元的目标数量,当目标数量超过第四数量或者当目标数量占子编码单元数量的比例超过第一比例时,确定目标编码单元对应的目标划分决策结果为划分。
具体地,第一比例可以根据需要设置,例如可以是四分之一。第四数量也可以根据需要设置,例如可以是5。当目标数量超过第四数量或者当目标数量占子编码单元数量的比例超过第一比例时,说明子编码单元中,图像内容变化比 较快,因此,需要对目标编码单元进行划分,以使得可以以子编码单元为单位进行编码,或者对子编码单元再进行划分,以减少预测残差。
本申请实施例提供的如何确定目标划分决策结果的步骤,可以应用于在采用率失真代价判断编码单元的划分深度的步骤之前。例如,采用多种混合树(如4叉树、3叉树以及2叉树)结构的划分方案确定编码单元,虽然能够提高编码效率,但是也带来了非常大的编码时间增加,对于编码器的加速非常不利。尤其是遍历的模式越多,每一种模式都要执行该模式下的帧内预测过程,帧内预测包含了67个帧内方向预测以及非常多的帧内预测技术,大量的率失真优化相关的决策会大大增加编码器的计算复杂度。而通过结合了二叉树、三叉树以及四叉树的结构特点,以及编码块的内容信息,可以提前决策当前CU块是否能确定划分或者不划分,从而能够有效降低CU块内的各种划分结构的试探性遍历,大大节省编码时间,提高了视频标准的压缩效率。
在一些实施例中,可以针对方形块(即目标编码单元为方形)或者非方形块提供不同的决策方法。例如,如以下伪代码所示,为针对方形块的第一种决策方法。其中,Thr_G表示全局梯度对应的阈值,Thr_L表示局部梯度对应的阈值。“&&”表示“与”符号,“||”表示“或”符号。a和b可以根据经验或者实验设置,例如。也可以通过统计的方法得到,统计采用率失真代价进行划分决策时,终止划分对应的a值以及b值,获取使得终止划分决策准确率高于预设准确率的值。a和b的值也可以根据目标编码单元的大小自适应调整。例如a和b可以分别为1.414和1.2。
getRatioFlag表示求比值。GlobalGradientSubBlock_Ver[H]表示子编码单元H的全局梯度差异,例如GlobalGradientSubBlock_Ver[1]表示子编码单元1的全局梯度差异。通过以下代码,可以得到,当局部梯度均小于对应的第一阈值,且全局梯度差异均小于对应的第一阈值时,则确定目标划分决策结果为不进行划分。如果不能判断一定是终止划分,则可以将目标编码单元分为4个子块,计算水平方向全局像素梯度数据差异以及垂直方向的全局像素梯度数据差异。如果有子块之间的全局像素梯度数据差异的比值大于b,则一定进行划分。
If(GlobalGradient_Hor<Thr_G&&LocalGradient_Hor<THr_L&&GlobalGradient_Ver<Thr_G&& LocalGradient_Ver<THr_L)//如果水平方向的全局梯度差异以及垂直方向的全局梯度差异小于全局梯度对应的阈值且水平方向的局部梯度以及水平方向的局部梯度小于局部梯度对应的阈值{
If(GlobalGradient_45<a*Thr_G&&LocalGradient_45<a*THr_L&&GlobalGradient_135<a*Thr_G&&LocalGradient_135<a*THr_L)//如果45度方向的全局梯度差异以及135度方向的全局梯度差异小于全局梯度对应的阈值的a倍,且45度方向的局部梯度以及135度方向的局部梯度小于局部梯度对应的阈值的b倍{
NotSplit=1;//一定不进行划分,即划分决策结果为终止划分。
}
}else//否则,进行以下运算{
Int HorFalg=getRatioFlag(GlobalGradientSubBlock_Hor[0],GlobalGradientSubBlock_Hor[1])||getRatioFlag(GlobalGradientSubBlock_Hor[2],GlobalGradientSubBlock_Hor[3]);//计算子编码单元0与1的全局像素水平梯度差异的比值,或者子编码单元2与3的全局像素水平梯度差异的比值,即计算水平方向的全局像素梯度数据差异的比值。
Int VerFlag=getRatioFlag(GlobalGradientSubBlock_Ver[0],GlobalGradientSubBlock_Ver[2])||getRatioFlag(GlobalGradientSubBlock_Ver[1],GlobalGradientSubBlock_Ver[3]);//计算子编码单元0与2的全局像素垂直梯度差异的比值,或者子编码单元1与3的全局像素垂直梯度差异的比值,即计算垂直方向的全局像素梯度数据的差异的比值。
If(HorFalg&&VerFlag){
DoSplit=1;//如果垂直方向的全局像素垂直梯度差异的比值以及水平方向的全局像素水平梯度差异的比值均大于b,则划分决策结果为进行划分
}
}
Bool getRatioFlag(double x,double y){
If(x==0||y==0)returntrue;
If(x/y>b||y/x>b)return true;//如果x除以y或者y除以x大于b,则返回真。
Else return false;//否则,则返回假。
}
在一些实施例中,如以下伪代码所示,为针对方形块的第二种决策方法。当进行二叉树和三叉树划分时,可以采用第二种决策方法,以对决策条件进行放宽。Thr_G_Limit是指全局梯度对应的梯度阈值,可以与Thr_G相同,也可以不同。Thr_L_Limit指局部梯度对应的梯度阈值,可以与Thr_L相同,也可以不同。
If(min{GlobalGradient_Hor,GlobalGradient_Ver,GlobalGradient_45,GlobalGradient_135}>Thr_G_Limit||min{LocalGradient_Hor,LocalGradient_Ver,LocalGradient_45,LocalGradient_135}>Thr_L_Limit){
DoSplit=1;//如果水平方向、垂直方向、45度方向以及135度方向分别对应的全局像素梯度差异中的最小值大于全局像素梯度对应的梯度阈值,以及水平方向、垂直方向、45度方向以及135度方向分别对应的局部像素梯度中的最小值大于局部像素梯度对应的梯度阈值,则划分决策结果为进行划分
}
在一些实施例中,如以下伪代码所示,为针对非方形块的第一种决策方法。GlobalGradient_Hor_Left、GlobalGradient_Hor_Right、GlobalGradient_Ver_Above、GlobalGradient_Ver_Below分别指左半部分对应的全局梯度差异、右半部分对应的全局梯度差异、上半部分对应的全局梯度差异、下半部分对应的全局梯度差异。
a_hor可以是预设的值,可以参考获取预设值a的方法确定。
If(Width/Height>2)//当宽度与高度的比值大于2{
If(GlobalGradient_Hor_Left>a_hor*GlobalGradient_Hor_Right||GlobalGradient_Hor_Right>a_hor*GlobalGradient_Hor_Left){
DoSplit=1;//如果左半部分对应的全局梯度差异是右半部分对应的全局梯度差异的a_hor倍,或者右半部分对应的全局梯度差异是左半部分对应的全局梯度差异的a_hor倍,则划分决策结果为进行划分。
}
}
If(Height/Width>2)//当高度与宽度的比值大于2{
If(GlobalGradient_Ver_Above>a_hor*GlobalGradient_Ver_Below||GlobalGradient_Ver_Below>a_hor*GlobalGradient_Ver_Above){
DoSplit=1;//如果上半部分对应的全局梯度差异是下半部分对应的全局梯度的a_hor倍,或者下半部分对应的全局梯度差异是上半部分对应的全局梯度差异的a_hor倍,则划分决策结果为进行划分。
}
}
在一些实施例中,如以下伪代码所示,为针对非方形块的第二种决策方法。可以是在如果采用非方形块的第一种决策方法不能确定是否一定划分时,采用针对非方形块的第二种决策方法进行决策,以进行更加精细的运算。可以对目标编码单元以最小编码单元例如4*4像素进行划分,分别对每一个子编码进行局部梯度的计算,局部梯度的计算方式为逐像素的差值。subBlkNum指子编码单元的数量。Thr_Sub_Limit表示梯度阈值,即可以遍历目标编码单元内的每一个子块的梯度值,并统计大于Thr_Sub_Limit的子块的个数,如果超过1/4的子块的梯度大于Thr_Sub_Limit,那么该块的划分决策结果为进行划分。
Int count=0;
For(int I=0;I<subBlkNum;i++){
If(Gradient_subBlk[i]>Thr_Sub_Limit)count++;
If(count>1/4*subBlkNum){
DoSplit=1;
Break;
}
}
以下以一个具体的实施例对本申请提供的视频编码方法进行说明:
1、从视频序列中接收视频图像,将视频图像分解为一个或多个预设大小的编码单元,作为目标编码单元。
2、将目标编码单元划分为多个区域,计算各个区域对应的像素梯度,计算区域之间的全局像素梯度差异。
例如,可以对目标编码单元进行水平划分、垂直划分以及对角划分。计算各个区域的全局像素梯度,根据公式(5)至(8)计算计算水平方向、垂直方 向、45度方向以及135度方向分别对应的全局像素梯度的梯度差异。
3、计算目标编码单元在各个方向的局部像素梯度。
例如,梯度计算方向可以包括水平方向、垂直方向、45度方向以及135度方向。可以根据公式(1)至(4)计算计算水平方向、垂直方向、45度方向以及135度方向的局部像素梯度。
4、判断各个全局像素梯度差异是否小于对应的第一阈值,判断各个局部像素梯度是否小于对应的第一阈值,如果是则确定目标划分决策结果为终止划分。如果不是,则进入步骤5。
5、将目标编码单元划分为4个子编码单元,计算各个子编码单元对应的全局像素梯度,计算垂直方向对应的全局像素梯度数据差异以及水平方向对应的全局像素梯度数据差异。
例如,水平方向的像素梯度差异可以包括子编码单元0与子编码单元1的像素梯度数据差异、或者子编码单元2与子编码单元3的像素梯度数据差异的至少一个。垂直方向的像素梯度数据差异可以包括子编码单元0与子编码单元2的像素梯度数据差异、或者子编码单元1与子编码单元3的像素梯度数据差异的至少一个。
6、当子编码单元之间的像素梯度数据差异满足第三阈值判断条件时,确定目标编码单元对应的目标划分决策结果为划分。如果不满足,则进入步骤7。
例如,如果子编码单元0与子编码单元1的像素梯度数据差异、子编码单元2与子编码单元3的像素梯度数据差异、子编码单元0与子编码单元2的像素梯度数据差异、子编码单元1与子编码单元3的像素梯度数据差异中有至少一个的差异大于第三阈值时,则确定目标编码单元对应的目标划分决策结果为划分。
7、根据率失真代价确定目标编码单元的划分深度。
例如,可以计算目标编码单元作为完整块进行编码对应的第一率失真代价,以及将目标编码单元分成4个子块进行编码对应的第二率失真代价,如果第一率失真代价大,则划分决策结果为划分。如果第二率失真代价大,则划分决策结果为终止划分。
在一些实施例中,对于亮度(luma)块以及色度(chroma)块,均可以采用本申请实施例中的方法进行块划分(编码单元划分)的决策。
如图11所示,在一些实施例中,提供了一种视频编码装置,该视频编码装置可以集成于上述的终端装置中,具体可以包括目标编码单元获取模块1102、目标像素梯度数据得到模块1104、目标划分决策结果确定模块1106以及视频编码模块1108。
目标编码单元获取模块1102,用于获取待编码的目标编码单元。
目标像素梯度数据得到模块1104,用于根据目标编码单元对应的像素点的像素值计算得到目标像素梯度数据,目标像素梯度数据根据像素点的像素值与参考像素值的差异得到。
目标划分决策结果确定模块1106,用于根据目标像素梯度数据确定目标编码单元对应的目标划分决策结果。
视频编码模块1108,用于根据目标划分决策结果对目标编码单元进行视频编码。
在一些实施例中,目标像素梯度数据得到模块1104包括:
参考像素值获取单元,用于确定目标编码单元中的当前像素点,获取当前像素点对应的目标相邻像素点的像素值,作为参考像素值。
像素值差异计算单元,用于计算当前像素点的像素值与参考像素值的差异,得到当前像素点对应的像素值差异。
目标像素梯度数据得到单元,用于对目标编码单元中各个像素点对应的像素值差异进行统计,得到目标像素梯度数据。
在一些实施例中,确定当前像素点对应的目标相邻像素点的单元用于:获取目标梯度计算方向;获取目标梯度计算方向上,当前像素点对应的相邻像素点,作为目标相邻像素点。
在一些实施例中,目标像素梯度数据得到模块1104包括:
像素梯度计算单元,用于将目标编码单元划分为多个区域,计算各个区域对应的像素梯度,区域对应的像素梯度根据区域对应的像素点的像素值与参考像素值的差异得到。
梯度差异计算单元,用于计算区域之间的像素梯度差异,得到目标像素梯度数据。
在一些实施例中,梯度差异计算单元用于:将第一区域对应的像素梯度减去第二区域对应的像素梯度,得到第一区域与第二区域之间的像素梯度差异。
在一些实施例中,目标划分决策结果确定模块1106用于:当目标像素梯度数据满足第一阈值条件时,确定目标编码单元对应的目标划分决策结果为终止划分,第一阈值条件包括小于对应的第一阈值的目标像素梯度数据大于或者等于第一数量。
在一些实施例中,目标划分决策结果确定模块1106用于:当目标像素梯度数据满足第二阈值条件时,确定目标编码单元对应的目标划分决策结果为划分,第二阈值条件包括大于对应的第二阈值的目标像素梯度数据大于或者等于第二数量。
在一些实施例中,目标像素梯度数据得到模块1104包括:
划分单元,用于对目标编码单元进行划分,得到多个子编码单元。
子编码单元梯度数据计算单元,用于计算各个子编码单元对应的像素梯度数据,得到目标像素梯度数据,子编码单元对应的像素梯度根据子编码单元对应的像素点的像素值与参考像素值的差异得到。
在一些实施例中,目标划分决策结果确定模块1106用于:计算子编码单元之间的像素梯度数据差异,根据子编码单元之间的像素梯度数据差异确定目标编码单元对应的目标划分决策结果。
在一些实施例中,目标划分决策结果确定模块1106用于:当子编码单元之间的像素梯度数据差异满足第三阈值判断条件时,确定目标编码单元对应的目标划分决策结果为划分,第三阈值判断条件包括大于对应的第三阈值的像素梯度数据差异大于或者等于第三数量。
在一些实施例中,目标划分决策结果确定模块1106用于:获取像素梯度数据大于第四阈值的子编码单元的目标数量,当目标数量超过第四数量或者当目标数量占子编码单元数量的比例超过第一比例时,确定目标编码单元对应的目标划分决策结果为划分。
在一些实施例中,视频编码模块1108用于:当目标划分决策结果为终止划分时,以目标编码单元为编码单位进行视频编码;当目标划分决策结果为划分时,对目标编码单元进行划分,得到多个子编码单元,根据子编码单元进行视频编码。
图12示出了一些实施例中计算机设备的内部结构图。该计算机设备具体可以是图1中的终端装置。如图12所示,该计算机设备包括通过系统总线连接的处理器、存储器、网络接口、输入装置和显示屏。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器实现视频编码方法。该内存储器中也可储存有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器执行视频编码方法。计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
本领域技术人员可以理解,图12中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一些实施例中,本申请提供的视频编码装置可以实现为一种计算机可读指令的形式,计算机可读指令可在如图12所示的计算机设备上运行。计算机设备可以是终端或者服务器,计算机设备的存储器中可存储组成该视频编码装置的各个程序模块,比如,图11所示的目标编码单元获取模块1102、目标像素梯度数据得到模块1104、目标划分决策结果确定模块1106以及视频编码模块1108。各个程序模块构成的计算机可读指令使得处理器执行本说明书中描述的本申请各个实施例的视频编码方法中的步骤。
例如,图12所示的计算机设备可以通过如图11所示的视频编码装置中的目标编码单元获取模块1102获取待编码的目标编码单元。通过目标像素梯度数据得到模块1104根据目标编码单元对应的像素点的像素值计算得到目标像素梯 度数据,目标像素梯度数据根据像素点的像素值与参考像素值的差异得到。通过目标划分决策结果确定模块1106根据目标像素梯度数据确定目标编码单元对应的目标划分决策结果。通过视频编码模块1108根据目标划分决策结果对目标编码单元进行视频编码。
在一些实施例中,提供了一种计算机设备,包括存储器和处理器,存储器存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行上述视频编码方法的步骤。此处视频编码方法的步骤可以是上述各个实施例的视频编码方法中的步骤。
在一些实施例中,提供了一种计算机可读存储介质,存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行上述视频编码方法的步骤。此处视频编码方法的步骤可以是上述各个实施例的视频编码方法中的步骤。
在一个实施例中,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述各方法实施例中的步骤。
应该理解的是,虽然本申请各实施例的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,各实施例中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性 存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。
Claims (15)
- 一种视频编码方法,由计算机设备执行,所述方法包括:获取待编码的目标编码单元;根据所述目标编码单元对应的像素点的像素值计算得到目标像素梯度数据,所述目标像素梯度数据根据所述像素点的像素值与参考像素值的差异得到;根据所述目标像素梯度数据确定所述目标编码单元对应的目标划分决策结果;及根据所述目标划分决策结果对所述目标编码单元进行视频编码。
- 根据权利要求1所述的方法,其特征在于,所述根据所述目标编码单元对应的像素点的像素值计算得到目标像素梯度数据包括:确定所述目标编码单元中的当前像素点,获取所述当前像素点对应的目标相邻像素点的像素值,作为参考像素值;计算所述当前像素点的像素值与所述参考像素值的差异,得到所述当前像素点对应的像素值差异;及对所述目标编码单元中各个所述像素点对应的像素值差异进行统计,得到所述目标像素梯度数据。
- 根据权利要求2所述的方法,其特征在于,确定所述当前像素点对应的目标相邻像素点的步骤包括:获取目标梯度计算方向;获取所述目标梯度计算方向上,所述当前像素点对应的相邻像素点,作为所述目标相邻像素点。
- 根据权利要求1所述的方法,其特征在于,所述根据所述目标编码单元对应的像素点的像素值计算得到目标像素梯度数据包括:将所述目标编码单元划分为多个区域,计算各个所述区域对应的像素梯度,所述区域对应的像素梯度根据所述区域对应的像素点的像素值与参考像素值的差异得到;计算所述区域之间的像素梯度差异,得到所述目标像素梯度数据。
- 根据权利要求4所述的方法,其特征在于,所述计算所述区域之间的像素梯度差异包括:将第一区域对应的像素梯度减去第二区域对应的像素梯度,得到所述第一区域与所述第二区域之间的像素梯度差异。
- 根据权利要求2或4所述的方法,其特征在于,所述根据所述目标像素梯度数据确定所述目标编码单元对应的目标划分决策结果包括:当所述目标像素梯度数据满足第一阈值条件时,确定所述目标编码单元对应的目标划分决策结果为终止划分,所述第一阈值条件包括小于对应的第一阈值的目标像素梯度数据大于或者等于第一数量。
- 根据权利要求2或4所述的方法,其特征在于,所述根据所述目标像素梯度数据确定所述目标编码单元对应的目标划分决策结果包括:当所述目标像素梯度数据满足第二阈值条件时,确定所述目标编码单元对应的目标划分决策结果为划分,所述第二阈值条件包括大于对应的第二阈值的目标像素梯度数据大于或者等于第二数量。
- 根据权利要求1所述的方法,其特征在于,所述根据所述目标编码单元对应的像素点的像素值计算得到目标像素梯度数据包括:对所述目标编码单元进行划分,得到多个子编码单元;计算各个所述子编码单元对应的像素梯度数据,得到所述目标像素梯度数据,所述子编码单元对应的像素梯度根据所述子编码单元对应的像素点的像素值与参考像素值的差异得到。
- 根据权利要求8所述的方法,其特征在于,所述根据所述目标像素梯度数据确定所述目标编码单元对应的目标划分决策结果包括:计算所述子编码单元之间的像素梯度数据差异,根据所述子编码单元之间的像素梯度数据差异确定所述目标编码单元对应的目标划分决策结果。
- 根据权利要求9所述的方法,其特征在于,所述根据所述子编码单元之间的像素梯度数据差异确定所述目标编码单元对应的目标划分决策结果包括:当所述子编码单元之间的像素梯度数据差异满足第三阈值判断条件时,确 定所述目标编码单元对应的目标划分决策结果为划分,所述第三阈值判断条件包括大于对应的第三阈值的像素梯度数据差异大于或者等于第三数量。
- 根据权利要求8所述的方法,其特征在于,所述根据所述目标像素梯度数据确定所述目标编码单元对应的目标划分决策结果包括:获取像素梯度数据大于第四阈值的子编码单元的目标数量,当所述目标数量超过第四数量或者当所述目标数量占子编码单元数量的比例超过第一比例时,确定所述目标编码单元对应的目标划分决策结果为划分。
- 根据权利要求1所述的方法,其特征在于,所述根据所述目标划分决策结果对所述目标编码单元进行视频编码包括:当所述目标划分决策结果为终止划分时,以所述目标编码单元为编码单位进行视频编码;当所述目标划分决策结果为划分时,对所述目标编码单元进行划分,得到多个子编码单元,根据所述子编码单元进行视频编码。
- 一种视频编码装置,所述装置包括:目标编码单元获取模块,用于获取待编码的目标编码单元;目标像素梯度数据得到模块,用于根据所述目标编码单元对应的像素点的像素值计算得到目标像素梯度数据,所述目标像素梯度数据根据所述像素点的像素值与参考像素值的差异得到;目标划分决策结果确定模块,用于根据所述目标像素梯度数据确定所述目标编码单元对应的目标划分决策结果;及视频编码模块,用于根据所述目标划分决策结果对所述目标编码单元进行视频编码。
- 一种计算机设备,其特征在于,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行权利要求1至12中任一项权利要求所述视频编码方法的步骤。
- 一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述处理器执行权利要求1至12中任一项权利要求所述视频编码方法的步骤。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/512,486 US11949879B2 (en) | 2019-10-22 | 2021-10-27 | Video coding method and apparatus, computer device, and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911005290.8 | 2019-10-22 | ||
CN201911005290.8A CN112702603A (zh) | 2019-10-22 | 2019-10-22 | 视频编码方法、装置、计算机设备和存储介质 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/512,486 Continuation US11949879B2 (en) | 2019-10-22 | 2021-10-27 | Video coding method and apparatus, computer device, and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021077914A1 true WO2021077914A1 (zh) | 2021-04-29 |
Family
ID=75504751
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/113153 WO2021077914A1 (zh) | 2019-10-22 | 2020-09-03 | 视频编码方法、装置、计算机设备和存储介质 |
Country Status (3)
Country | Link |
---|---|
US (1) | US11949879B2 (zh) |
CN (1) | CN112702603A (zh) |
WO (1) | WO2021077914A1 (zh) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113747177B (zh) * | 2021-08-05 | 2023-06-20 | 中山大学 | 基于历史信息的帧内编码速度优化方法、装置及介质 |
CN113784140B (zh) * | 2021-09-15 | 2023-11-07 | 深圳市朗强科技有限公司 | 一种数学无损编码方法及设备 |
CN116897538A (zh) * | 2022-02-03 | 2023-10-17 | 梦芯片技术股份有限公司 | 基于梯度的逐像素图像空间预测 |
CN115209147B (zh) * | 2022-09-15 | 2022-12-27 | 深圳沛喆微电子有限公司 | 摄像头视频传输带宽优化方法、装置、设备及存储介质 |
CN116010642B (zh) * | 2023-03-27 | 2023-06-20 | 北京滴普科技有限公司 | 一种基于hog特征的印章快速查询方法及系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104363450A (zh) * | 2014-11-27 | 2015-02-18 | 北京奇艺世纪科技有限公司 | 一种帧内编码模式决策方法及装置 |
CN105721865A (zh) * | 2016-02-01 | 2016-06-29 | 同济大学 | 一种hevc帧间编码单元划分的快速决策算法 |
CN107071418A (zh) * | 2017-05-05 | 2017-08-18 | 上海应用技术大学 | 一种基于决策树的hevc帧内编码单元快速划分方法 |
CN109963151A (zh) * | 2017-12-14 | 2019-07-02 | 腾讯科技(深圳)有限公司 | 编码单元划分确定方法及装置、终端设备及可读存储介质 |
CN110351556A (zh) * | 2018-04-02 | 2019-10-18 | 腾讯科技(北京)有限公司 | 确定编码单元的编码代价的方法及相关装置 |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7031393B2 (en) * | 2000-10-20 | 2006-04-18 | Matsushita Electric Industrial Co., Ltd. | Block distortion detection method, block distortion detection apparatus, block distortion removal method, and block distortion removal apparatus |
TW594674B (en) * | 2003-03-14 | 2004-06-21 | Mediatek Inc | Encoder and a encoding method capable of detecting audio signal transient |
KR101798079B1 (ko) * | 2010-05-10 | 2017-11-16 | 삼성전자주식회사 | 픽셀값의 차분을 이용하여 영상 프레임을 부호화하는 방법 및 이를 위한 장치 |
WO2012044124A2 (ko) * | 2010-09-30 | 2012-04-05 | 한국전자통신연구원 | 영상 부호화 방법과 복호화 방법 및 이를 이용한 영상 부호화 장치와 복호화 장치 |
KR101343554B1 (ko) * | 2012-07-06 | 2013-12-20 | 인텔렉추얼디스커버리 주식회사 | 영상 검색방법 및 장치 |
WO2014120368A1 (en) * | 2013-01-30 | 2014-08-07 | Intel Corporation | Content adaptive entropy coding for next generation video |
WO2014146219A1 (en) * | 2013-03-22 | 2014-09-25 | Qualcomm Incorporated | Depth modeling modes for depth map intra coding |
JP6090430B2 (ja) * | 2013-03-26 | 2017-03-08 | 富士通株式会社 | 符号化装置、方法、プログラム、コンピュータシステム、記録媒体 |
BR112016015109A2 (pt) * | 2013-12-30 | 2017-08-08 | Qualcomm Inc | Simplificação de codificação residual dc delta em codificação de vídeo 3d |
CN104023241B (zh) * | 2014-05-29 | 2017-08-04 | 华为技术有限公司 | 帧内预测编码的视频编码方法及视频编码装置 |
US20150365703A1 (en) * | 2014-06-13 | 2015-12-17 | Atul Puri | System and method for highly content adaptive quality restoration filtering for video coding |
KR20180107153A (ko) * | 2016-02-16 | 2018-10-01 | 삼성전자주식회사 | 영상 부호화 방법 및 장치와 영상 복호화 방법 및 장치 |
CN117615147A (zh) * | 2016-07-14 | 2024-02-27 | 三星电子株式会社 | 视频解码方法及其装置以及视频编码方法及其装置 |
US10999576B2 (en) * | 2017-05-03 | 2021-05-04 | Novatek Microelectronics Corp. | Video processing method |
CN110999304B (zh) * | 2017-07-28 | 2023-12-08 | 韩国电子通信研究院 | 图像处理方法和图像编码/解码方法以及使用图像处理方法和图像编码/解码方法的装置 |
WO2019143093A1 (ko) * | 2018-01-16 | 2019-07-25 | 삼성전자주식회사 | 비디오 복호화 방법 및 장치, 비디오 부호화 방법 및 장치 |
CN112087624A (zh) * | 2019-06-13 | 2020-12-15 | 深圳市中兴微电子技术有限公司 | 基于高效率视频编码的编码管理方法 |
-
2019
- 2019-10-22 CN CN201911005290.8A patent/CN112702603A/zh active Pending
-
2020
- 2020-09-03 WO PCT/CN2020/113153 patent/WO2021077914A1/zh active Application Filing
-
2021
- 2021-10-27 US US17/512,486 patent/US11949879B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104363450A (zh) * | 2014-11-27 | 2015-02-18 | 北京奇艺世纪科技有限公司 | 一种帧内编码模式决策方法及装置 |
CN105721865A (zh) * | 2016-02-01 | 2016-06-29 | 同济大学 | 一种hevc帧间编码单元划分的快速决策算法 |
CN107071418A (zh) * | 2017-05-05 | 2017-08-18 | 上海应用技术大学 | 一种基于决策树的hevc帧内编码单元快速划分方法 |
CN109963151A (zh) * | 2017-12-14 | 2019-07-02 | 腾讯科技(深圳)有限公司 | 编码单元划分确定方法及装置、终端设备及可读存储介质 |
CN110351556A (zh) * | 2018-04-02 | 2019-10-18 | 腾讯科技(北京)有限公司 | 确定编码单元的编码代价的方法及相关装置 |
Non-Patent Citations (1)
Title |
---|
CAIXIA BAI; CHUN YUAN: "Fast coding tree unit decision for HEVC intra coding", 2013 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - CHINA, 11 April 2013 (2013-04-11), pages 28 - 31, XP032583169, DOI: 10.1109/ICCE-China.2013.6780861 * |
Also Published As
Publication number | Publication date |
---|---|
CN112702603A (zh) | 2021-04-23 |
US11949879B2 (en) | 2024-04-02 |
US20220053198A1 (en) | 2022-02-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7529348B2 (ja) | エンコーダ、デコーダ、および対応するイントラ予測方法 | |
WO2021077914A1 (zh) | 视频编码方法、装置、计算机设备和存储介质 | |
WO2022022297A1 (zh) | 视频解码方法、视频编码方法、装置、设备及存储介质 | |
CN116156197B (zh) | 视频解码、编码方法和装置、计算机设备及存储介质 | |
JP2023078188A (ja) | エンコーダ、デコーダ、インター予測のための対応する方法 | |
JP7571227B2 (ja) | 行列ベースのイントラ予測と二次変換コア選択を調和させるエンコーダ、デコーダ、および対応する方法 | |
CN110944185B (zh) | 视频解码的方法和装置、计算机设备及存储介质 | |
CN112913250B (zh) | 编码器、解码器及对任意ctu尺寸使用ibc搜索范围优化的对应方法 | |
CN113596475A (zh) | 图像/视频编码方法、装置、系统及计算机可读存储介质 | |
CN113557527B (zh) | 视频解码方法、视频解码器及介质 | |
CN112673633B (zh) | 合并模式的编码器、解码器及对应方法 | |
CN111819852A (zh) | 用于变换域中残差符号预测的方法及装置 | |
CN110730351B (zh) | 视频解码的方法和装置、存储介质 | |
CN113660497B (zh) | 编码器、解码器和使用ibc合并列表的对应方法 | |
CN114598873B (zh) | 量化参数的解码方法和装置 | |
WO2021196035A1 (zh) | 视频编码的方法和装置 | |
WO2023020320A1 (zh) | 熵编解码方法和装置 | |
CN113382249B (zh) | 图像/视频编码方法、装置、系统及计算机可读存储介质 | |
JP2023100701A (ja) | イントラ予測のためのイントラモードコーディングを使用するエンコーダ、デコーダ、および対応する方法 | |
CN113228632B (zh) | 用于局部亮度补偿的编码器、解码器、以及对应方法 | |
WO2021027799A1 (zh) | 视频编码器及qp设置方法 | |
KR20220088504A (ko) | 컨텐츠 적응적 분할 예측 | |
CN113766227B (zh) | 用于图像编码和解码的量化和反量化方法及装置 | |
RU2777967C1 (ru) | Деблокирующий фильтр для границ подразделов, возникающих под действием инструмента кодирования интра-подразделов |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20878400 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 15/06/2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20878400 Country of ref document: EP Kind code of ref document: A1 |