1 Introduction
With the rapid development of information technology, videos have been applied to the fields of entertainment, surveillance, education, and so on. To adapt to more applications in our daily life, the videos have evolved in various dimensions in the last decade, including
High Definition (
HD),
Wide Color Gamut (
WCG) [
14],
High Dynamic Range (
HDR) [
14],
Multi-view Video plus Depth (
MVD) [
28], 360 degree video [
43], light field image/video [
11], and dynamic point cloud [
33]. Unfortunately, from low dimension to high dimension, the dramatically increased video data challenges the limited storage space and transmission bandwidth. From H.264/
Advanced Video Coding (
AVC) [
39],
High Efficiency Video Coding (
HEVC) [
35], to the state-of-the-art
Versatile Video Coding (
VVC) [
7] that was issued in 2020, although a large compression ratio has been achieved, it still cannot catch up with the increase of video data. Advanced video compression algorithm is always desired to maximize the visual quality at a given bandwidth budget.
In the framework of existing hybrid video coding, the modules mainly consist of intra/inter prediction, transform, quantization, entropy encoding and in-loop filtering. To improve compression efficiency, a variety of novel coding tools have been developed in the issued standards, including
QuadTree plus Multi-Type Tree (
QT+
MTT) structure [
17] for coding block partition,
Matrix-based Intra Prediction (
MIP) [
34] and
Cross-Component Linear Model (
CCLM) [
45] for intra luma and chroma prediction,
History-based Motion Vector Prediction (
HMVP) [
46] and
Decoder-side MV Refinement (
DMVR) [
15] for motion estimation/compensation,
Multiple Transform Selection (
MTS) [
8] for transform, CABAC engine with multi-hypothesis probability estimation for entropy encoding, and
Sample Adaptive Offset (
SAO) and
Adaptive Loop Filter (
ALF) for in-loop filtering. These mentioned coding tools have achieved significant coding gains.
One of the most important modules is intra prediction [
19], which aims to remove spatial redundancy as much as possible. Parts of the available neighboring blocks are weighted to produce the predicted block. Traditionally, intra modes include Planar, DC, and angular modes. To achieve more accurate prediction result, various algorithms have been developed. In [
4], intra prediction was analyzed in frequency domain, and the frequency components were selectively discarded to improve the performance. Li et al. [
20] presented a bi-intra prediction method based on the binary combination of existing uni-intra prediction modes. Rather than regular out-block reference pixels, the in-block ones were employed in [
2] to perform intra prediction for screen content, and an additional in-loop residual signal was used. An iterative filtering method was employed for intra prediction in addition to the traditional intra prediction in [
10]. To achieve more reference pixels, the multi-line based scheme was presented in [
21], where six more lines of pixels located at the above and left neighbors were collected. Different from fixed scan order, an adaptive block coding order [
49] was proposed for intra prediction to better exploit spatial correlations. In analogous to motion estimation in inter coding,
Intra Block Copy (
IBC) [
42] was introduced for screen content, which aims to exploit long distance correlations in an image. Two modes with high probability from gradient histogram were combined to generate a new intra mode in [
1]. In [
47], the local and nonlocal correlations were exploited for hybrid intra prediction, where the adaptive template matching prediction, combined local and nonlocal prediction, combined neighboring modes prediction were performed. These methods mentioned above exploit spatial redundancy from neighbors with manually designed functions, which may limit the performance. Advanced schemes are desired to adapt to diverse video contents.
To further improve compression efficiency of intra coding, the problem of signal processing is formulated as an artificial intelligence task, where powerful neural network is adopted [
25,
48] and a training database for deep video compression is provided in [
24]. In specific, the problem of intra luma prediction was formulated as an inpainting task [
50], and the problem of intra chroma prediction was modeled as a colorization task [
23,
52]. An iterative training strategy for neural network was presented in [
12], where training blocks were collected from previous iteration to further improve performance. Wang et al. [
38] proposed a multi-scale convolutional network based intra prediction approach, in which the neighboring reconstructed L-shape was fed to the network as well as the traditional angular intra prediction result to make more accurate prediction. With conditional autoencoder [
6], multi-mode intra prediction was performed for luma and chroma components. Sun et al. [
36] proposed two enhanced intra prediction schemes with multiple neural networks, where the appending scheme was to replace the traditional modes and the substitution scheme was to replace the highest and lowest probable traditional modes. In [
16], a progressive spatial recurrent neural network was presented for intra prediction, which was able to produce prediction by passing information along from previous output. To adapt to variable coding blocks in intra prediction, fully connected and convolutional neural networks were carefully designed [
13] for small and large blocks, respectively. Most of these existing learning based methods aim to make more accurate luma and chroma predictions from a regression perspective to achieve coding gains, while the module of intra mode derivation has not been exploited from a classification perspective with deep learning tools.
In intra coding, the intra mode is also required to be encoded and transmitted to the decoder side besides residual signal. For intra mode signaling,
Most Probable Mode (
MPM) list, which is constructed from the neighboring blocks, plays an important role and saves significant coding bits. In [
18], two MPM construction methods were presented for VVC, where one was extended from HEVC, and the other was sorted according to the probability of each candidate. Besides the nearest neighboring lines, Chang et al. [
9] extended MPM mechanism to
Multi-Reference Line (
MRL) scheme for better performance. A conditional random field model was established to re-construct the MPM list in [
22], where the short and long range correlations were considered. In addition, decision tree was utilized to exploit multiple dynamic lists of intra mode signaling [
32]. By investigating the occurrences of intra modes in the neighboring blocks,
Most Frequent Mode (
MFM) list [
44] was derived to compete with the existing MPM list. To skip intra mode signaling and save coding bits, Xu et al. [
41] proposed a predictive coding scheme, in which the angular correlation in spatial domain was calculated with modulo-N arithmetic operations. Additionally, template based [
40], histogram of gradients based [
30], and texture analysis based [
29] intra mode derivation methods were presented in a manual manner. For depth video coding, a coding tool [
27] was presented to reduce intra mode signaling bitrate, in which the texture intra modes were inherited for the depth intra modes. Basically, the MPM list construction and intra mode derivation have been investigated by traditional statistics and experience, which can be further improved with advanced learning based schemes.
In this work, to skip the module of intra mode signaling and save coding bits, the process of intra mode derivation is formulated as a multi-class classification task. The main contributions of this work are listed as follows.
(1)
The process of intra mode derivation in intra coding is modeled as a multi-class classification task, termed as Deep Learning based Intra Mode Derivation (DLIMD), which is used to skip the module of intra mode signaling for saving bits.
(2)
In DLIMD, the learned features and hand-crafted features are combined together for intra mode derivation. Additionally, the proposed DLIMD can be applied to variable coding blocks (including non-square blocks) and any different Quantization Parameter (QP) settings.
(3)
To further improve the performance, one additional binary flag is utilized to indicate the finally selected scheme from Rate Distortion (RD) cost competition. The proposed method achieves superior performance when compared with the state-of-the-art algorithms.
The remainder of this work is organized as follows. Motivation is presented in Section
2. The proposed DLIMD for video coding is discussed in detail in Section
3. The experiments are conducted and the results are analyzed in Section
4. Section
5 concludes this work.
2 Motivation
In VVC, intra coding modes/tools [
31] include DC, Planar, 65 angular modes,
Wide Angle Intra Prediction (
WAIP), MRL,
Position Dependent Prediction Combination (
PDPC), MIP,
Intra-Sub Partition (
ISP), and CCLM. It should be mentioned that the intra mode is also required to be encoded and transmitted to the decoder side. To effectively signal these intra modes to the decoder side, the derivation is performed with intra modes from neighbors, where six of them are produced and accommodated to the MPM list. Generally, the first one in the MPM list is always fixed, i.e., Planar mode, which is encoded with two-bit length. The other five MPMs are achieved according to spatial correlation from the neighbors, and encoded with three-bit to six-bit length. The non-MPM modes are divided into two parts which contain 3 and 58 modes, respectively. Truncated binary encoding is performed for them with six-bit and seven-bit length. The detailed intra modes signaling can be found in Figure
1. In addition, statistical experiments are conducted under the platform of
VVC Test Model version 5.0 (
VTM 5.0) to present coding
Bits Per intra Mode (
BPM), where ten sequences with various contents from different classes are encoded under
All Intra (
AI) configuration. The value of BPM is calculated by the total coding bits of intra mode against the number of intra blocks, where the coding bits are collected after CABAC entropy encoding. The statistical results are shown in the left columns of Table
1 and the values of BPM are 3.35, 3.48, 3.44, and 3.39 on average under four QP settings.
Furthermore, to demonstrate how many bits are spent in the module of intra mode signaling, the percentage of coding bits of intra mode in a frame is collected, and illustrated in the right columns of Table
1. It can be found that this percentage increases from 8.28% to 20.4% on average as QP value increases. In the case of small QP settings, the percentage is limited, because the coding bits of residue (the difference between prediction and source) are much larger than those of intra mode, while in the case of large QP settings, the coding bits of residue become limited, which results in a high percentage of coding bits of intra mode. From these results, we can conclude that if more advanced intra mode signaling approach is presented, the coding performance can be further improved.