MDPI - Publisher of Open Access Journals

15 pages, 4080 KiB

Open AccessArticle

Lossless and Near-Lossless L-Infinite Compression of Depth Video Data

by Mohammad Ali Tahouri, Alin Adrian Alecu, Leon Denis and Adrian Munteanu

Sensors 2025, 25(5), 1403; https://doi.org/10.3390/s25051403 - 25 Feb 2025

Viewed by 200

The acquisition of depth information sensorial data is critically important in medical applications, such as the monitoring of the elderly or the extraction of human biometrics. In such applications, compressing the stream of depth video data plays an important role due to bandwidth [...] Read more.

The acquisition of depth information sensorial data is critically important in medical applications, such as the monitoring of the elderly or the extraction of human biometrics. In such applications, compressing the stream of depth video data plays an important role due to bandwidth constraints on transmission channels. This paper introduces a novel lightweight compression system that encodes the semantics of the input depth video and can operate in both lossless and L-infinite near-lossless compression modes. A quantization technique that targets the L-infinite norm for sparse distributions and a new L-infinite compression method that sets bounds on the quantization error is proposed. The proposed codec enables the control of the coding error on every pixel in the input video data, which is crucial in medical applications. Experimental results show an average improvement of 45% and 17% in lossless mode compared to standalone JPEG-LS and CALIC codecs, respectively. Furthermore, in near-lossless mode, the proposed codec achieves superior rate-distortion performance and reduced maximum error per frame compared to HEVC. Additionally, the proposed lightweight codec is designed to perform efficiently in real time when deployed on an embedded depth-camera platform. Full article

(This article belongs to the Section Biomedical Sensors)

► Show Figures

Figure 1

16 pages, 433 KiB

Open AccessArticle

A Fast Coding Unit Partitioning Decision Algorithm for Versatile Video Coding Based on Gradient Feedback Hierarchical Convolutional Neural Network and Light Gradient Boosting Machine Decision Tree

by Fangmei Liu, Jiyuan Wang and Qiuwen Zhang

Electronics 2024, 13(24), 4908; https://doi.org/10.3390/electronics13244908 - 12 Dec 2024

Viewed by 573

Abstract

Video encoding technology is a foundational component in the advancement of modern technological applications. The latest standard in universal video coding, H.266/VVC, features a quad-tree with nested multi-type tree (QTMT) partitioning structure, which represents an improvement over its predecessor, High-Efficiency Video Coding (H.265/HEVC). [...] Read more.

Video encoding technology is a foundational component in the advancement of modern technological applications. The latest standard in universal video coding, H.266/VVC, features a quad-tree with nested multi-type tree (QTMT) partitioning structure, which represents an improvement over its predecessor, High-Efficiency Video Coding (H.265/HEVC). This configuration facilitates adaptable block segmentation, albeit at the cost of heightened encoding complexity. In view of the aforementioned considerations, this paper puts forth a deep learning-based approach to facilitate CU partitioning, with the aim of supplanting the intricate CU partitioning process observed in the Versatile Video Coding Test Model (VTM). We begin by presenting the Gradient Feedback Hierarchical CNN (GFH-CNN) model, an advanced convolutional neural network derived from the ResNet architecture, enabling the extraction of features from 64 × 64 coding unit (CU) blocks. Following this, a hierarchical network diagram (HND) is crafted to depict the delineation of partition boundaries corresponding to the various levels of the CU block’s layered structure. This diagram maps the features extracted by the GFH-CNN model to the partitioning at each level and boundary. In conclusion, a LightGBM-based decision tree classification model (L-DT) is constructed to predict the corresponding partition structure based on the prediction vector output from the GFH-CNN model. Subsequently, any errors in the partitioning results are corrected in accordance with the encoding constraints specified by the VTM, which ultimately determines the final CU block partitioning. The experimental results demonstrate that, in comparison with VTM-10.0, the proposed algorithm achieves a 48.14% reduction in complexity with only a 0.83% increase in bitrate under the top-three configuration, which is negligible. In comparison, the top-two configuration resulted in a higher complexity reduction of 63.78%, although this was accompanied by a 2.08% increase in bitrate. These results demonstrate that, in comparison to existing solutions, our approach provides an optimal balance between encoding efficiency and computational complexity. Full article

► Show Figures

Figure 1

16 pages, 10696 KiB

Open AccessArticle

A Framework for Symmetric-Quality S3D Video Streaming Services

by Juhyeon Lee, Seungjun Lee, Sunghoon Kim and Dongwook Kang

Appl. Sci. 2024, 14(23), 11011; https://doi.org/10.3390/app142311011 - 27 Nov 2024

Viewed by 569

Abstract

This paper proposes an efficient encoding framework based on Scalable High Efficiency Video Coding (SHVC) technology, which supports both low- and high-resolution 2D videos as well as stereo 3D (S3D) video simultaneously. Previous studies have introduced Cross-View SHVC, which encodes two videos with [...] Read more.

This paper proposes an efficient encoding framework based on Scalable High Efficiency Video Coding (SHVC) technology, which supports both low- and high-resolution 2D videos as well as stereo 3D (S3D) video simultaneously. Previous studies have introduced Cross-View SHVC, which encodes two videos with different viewpoints and resolutions using a Cross-View SHVC encoder, where the low-resolution video is encoded as the base layer and the other video as the enhancement layer. This encoder provides resolution diversity and allows the decoder to combine the two videos, enabling 3D video services. Even when 3D videos are composed of left and right videos with different resolutions, the viewer tends to perceive the quality based on the higher-resolution video due to the binocular suppression effect, where the brain prioritizes the high-quality image and suppresses the lower-quality one. However, recent experiments have shown that when the disparity between resolutions exceeds a certain threshold, it can lead to a subjective degradation of the perceived 3D video quality. To address this issue, a conditional replenishment algorithm has been studied, which replaces some blocks of the video using a disparity-compensated left-view image based on rate–distortion cost. This conditional replenishment algorithm (also known as VEI technology) effectively reduces the quality difference between the base layer and enhancement layer videos. However, the algorithm alone cannot fully compensate for the quality difference between the left and right videos. In this paper, we propose a novel encoding framework to solve the asymmetry issue between the left and right videos in 3D video services and achieve symmetrical video quality. The proposed framework focuses on improving the quality of the right-view video by combining the conditional replenishment algorithm with Cross-View SHVC. Specifically, the framework leverages the non-HEVC option of the SHVC encoder, using a VEI (Video Enhancement Information) restored image as the base layer to provide higher-quality prediction signals and reduce encoding complexity. Experimental results using animation and live-action UHD sequences show that the proposed method achieves BD-RATE reductions of 57.78% and 45.10% compared with HEVC and SHVC codecs, respectively. Full article

► Show Figures

Figure 1

18 pages, 4103 KiB

Open AccessArticle

Content-Adaptive Bitrate Ladder Estimation in High-Efficiency Video Coding Utilizing Spatiotemporal Resolutions

by Jelena Šuljug and Snježana Rimac-Drlje

Electronics 2024, 13(20), 4049; https://doi.org/10.3390/electronics13204049 - 15 Oct 2024

Viewed by 854

Abstract

The constant increase in multimedia Internet traffic in the form of video streaming requires new solutions for efficient video coding to save bandwidth and network resources. HTTP adaptive streaming (HAS), the most widely used solution for video streaming, allows the client to adaptively [...] Read more.

The constant increase in multimedia Internet traffic in the form of video streaming requires new solutions for efficient video coding to save bandwidth and network resources. HTTP adaptive streaming (HAS), the most widely used solution for video streaming, allows the client to adaptively select the bitrate according to the transmission conditions. For this purpose, multiple presentations of the same video content are generated on the video server, which contains video sequences encoded at different bitrates with resolution adjustment to achieve the best Quality of Experience (QoE). This set of bitrate–resolution pairs is called a bitrate ladder. In addition to the traditional one-size-fits-all scheme for the bitrate ladder, context-aware solutions have recently been proposed that enable optimum bitrate–resolution pairs for video sequences of different complexity. However, these solutions use only spatial resolution for optimization, while the selection of the optimal combination of spatial and temporal resolution for a given bitrate has not been sufficiently investigated. This paper proposes bit-ladder optimization considering spatiotemporal features of video sequences and usage of optimal spatial and temporal resolution related to video content complexity. Optimization along two dimensions of resolution significantly increases the complexity of the problem and the approach of intensive encoding for all spatial and temporal resolutions in a wide range of bitrates, for each video sequence, is not feasible in real time. In order to reduce the level of complexity, we propose a data augmentation using a neural network (NN)-based model. To train the NN model, we used seven video sequences of different content complexity, encoded with the HEVC encoder at five different spatial resolutions (SR) up to 4K. Also, all video sequences were encoded using four frame rates up to 120 fps, presenting different temporal resolutions (TR). The Structural Similarity Index Measure (SSIM) is used as an objective video quality metric. After data augmentation, we propose NN models that estimate optimal TR and bitrate values as switching points to a higher SR. These results can be further used as input parameters for the bitrate ladder construction for video sequences of a certain complexity. Full article

(This article belongs to the Special Issue Image and Video Processing and Retrieval Based on Machine Learning and Deep Learning)

► Show Figures

Figure 1

12 pages, 1792 KiB

Open AccessArticle

Information Bottleneck Driven Deep Video Compression—IBOpenDVCW

by Timor Leiderman and Yosef Ben Ezra

Entropy 2024, 26(10), 836; https://doi.org/10.3390/e26100836 - 30 Sep 2024

Viewed by 1283

Abstract

Video compression remains a challenging task despite significant advancements in end-to-end optimized deep networks for video coding. This study, inspired by information bottleneck (IB) theory, introduces a novel approach that combines IB theory with wavelet transform. We perform a comprehensive analysis of information [...] Read more.

Video compression remains a challenging task despite significant advancements in end-to-end optimized deep networks for video coding. This study, inspired by information bottleneck (IB) theory, introduces a novel approach that combines IB theory with wavelet transform. We perform a comprehensive analysis of information and mutual information across various mother wavelets and decomposition levels. Additionally, we replace the conventional average pooling layers with a discrete wavelet transform creating more advanced pooling methods to investigate their effects on information and mutual information. Our results demonstrate that the proposed model and training technique outperform existing state-of-the-art video compression methods, delivering competitive rate-distortion performance compared to the AVC/H.264 and HEVC/H.265 codecs. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

► Show Figures

Figure 1

24 pages, 6380 KiB

Open AccessArticle

Multi-Type Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec

by Woowoen Gwun, Kiho Choi and Gwang Hoon Park

Mathematics 2024, 12(18), 2874; https://doi.org/10.3390/math12182874 - 15 Sep 2024

Viewed by 1090

Abstract

Over the past few years, there has been substantial interest and research activity surrounding the application of Convolutional Neural Networks (CNNs) for post-filtering in video coding. Most current research efforts have focused on using CNNs with various kernel sizes for post-filtering, primarily concentrating [...] Read more.

Over the past few years, there has been substantial interest and research activity surrounding the application of Convolutional Neural Networks (CNNs) for post-filtering in video coding. Most current research efforts have focused on using CNNs with various kernel sizes for post-filtering, primarily concentrating on High-Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC). This narrow focus has limited the exploration and application of these techniques to other video coding standards such as AV1, developed by the Alliance for Open Media, which offers excellent compression efficiency, reducing bandwidth usage and improving video quality, making it highly attractive for modern streaming and media applications. This paper introduces a novel approach that extends beyond traditional CNN methods by integrating three different self-attention layers into the CNN framework. Applied to the AV1 codec, the proposed method significantly improves video quality by incorporating these distinct self-attention layers. This enhancement demonstrates the potential of self-attention mechanisms to revolutionize post-filtering techniques in video coding beyond the limitations of convolution-based methods. The experimental results show that the proposed network achieves an average BD-rate reduction of 10.40% for the Luma component and 19.22% and 16.52% for the Chroma components compared to the AV1 anchor. Visual quality assessments further validated the effectiveness of our approach, showcasing substantial artifact reduction and detail enhancement in videos. Full article

(This article belongs to the Special Issue New Advances and Applications in Image Processing and Computer Vision)

► Show Figures

Figure 1

26 pages, 7340 KiB

Open AccessArticle

Versatile Video Coding-Post Processing Feature Fusion: A Post-Processing Convolutional Neural Network with Progressive Feature Fusion for Efficient Video Enhancement

by Tanni Das, Xilong Liang and Kiho Choi

Appl. Sci. 2024, 14(18), 8276; https://doi.org/10.3390/app14188276 - 13 Sep 2024

Viewed by 1409

Abstract

Advanced video codecs such as High Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC) are vital for streaming high-quality online video content, as they compress and transmit data efficiently. However, these codecs can occasionally degrade video quality by adding undesirable artifacts such [...] Read more.

Advanced video codecs such as High Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC) are vital for streaming high-quality online video content, as they compress and transmit data efficiently. However, these codecs can occasionally degrade video quality by adding undesirable artifacts such as blockiness, blurriness, and ringing, which can detract from the viewer’s experience. To ensure a seamless and engaging video experience, it is essential to remove these artifacts, which improves viewer comfort and engagement. In this paper, we propose a deep feature fusion based convolutional neural network (CNN) architecture (VVC-PPFF) for post-processing approach to further enhance the performance of VVC. The proposed network, VVC-PPFF, harnesses the power of CNNs to enhance decoded frames, significantly improving the coding efficiency of the state-of-the-art VVC video coding standard. By combining deep features from early and later convolution layers, the network learns to extract both low-level and high-level features, resulting in more generalized outputs that adapt to different quantization parameter (QP) values. The proposed VVC-PPFF network achieves outstanding performance, with Bjøntegaard Delta Rate (BD-Rate) improvements of 5.81% and 6.98% for luma components in random access (RA) and low-delay (LD) configurations, respectively, while also boosting peak signal-to-noise ratio (PSNR). Full article

(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing, 2nd Edition)

► Show Figures

Figure 1

34 pages, 2908 KiB

Open AccessArticle

A Hybrid Contrast and Texture Masking Model to Boost High Efficiency Video Coding Perceptual Rate-Distortion Performance

by Javier Ruiz Atencia, Otoniel López-Granado, Manuel Pérez Malumbres, Miguel Martínez-Rach, Damian Ruiz Coll, Gerardo Fernández Escribano and Glenn Van Wallendael

Electronics 2024, 13(16), 3341; https://doi.org/10.3390/electronics13163341 - 22 Aug 2024

Cited by 1 | Viewed by 785

Abstract

As most of the videos are destined for human perception, many techniques have been designed to improve video coding based on how the human visual system perceives video quality. In this paper, we propose the use of two perceptual coding techniques, namely contrast [...] Read more.

As most of the videos are destined for human perception, many techniques have been designed to improve video coding based on how the human visual system perceives video quality. In this paper, we propose the use of two perceptual coding techniques, namely contrast masking and texture masking, jointly operating under the High Efficiency Video Coding (HEVC) standard. These techniques aim to improve the subjective quality of the reconstructed video at the same bit rate. For contrast masking, we propose the use of a dedicated weighting matrix for each block size (from

4 \times 4

up to

32 \times 32

), unlike the HEVC standard, which only defines an

8 \times 8

weighting matrix which it is upscaled to build the

16 \times 16

and

32 \times 32

weighting matrices (a

4 \times 4

weighting matrix is not supported). Our approach achieves average Bjøntegaard Delta-Rate (BD-rate) gains of between

2.5 %

and

4.48 %

, depending on the perceptual metric and coding mode used. On the other hand, we propose a novel texture masking scheme based on the classification of each coding unit to provide an over-quantization depending on the coding unit texture level. Thus, for each coding unit, its mean directional variance features are computed to feed a support vector machine model that properly predicts the texture type (plane, edge, or texture). According to this classification, the block’s energy, the type of coding unit, and its size, an over-quantization value is computed as a QP offset (DQP) to be applied to this coding unit. By applying both techniques in the HEVC reference software, an overall average of

5.79 %

BD-rate gain is achieved proving their complementarity. Full article

(This article belongs to the Special Issue Recent Advances in Image/Video Compression and Coding)

► Show Figures

Figure 1

19 pages, 7973 KiB

Open AccessArticle

Determining Thresholds for Optimal Adaptive Discrete Cosine Transformation

by Alexander Khanov, Anastasija Shulzhenko, Anzhelika Voroshilova, Alexander Zubarev, Timur Karimov and Shakeeb Fahmi

Algorithms 2024, 17(8), 366; https://doi.org/10.3390/a17080366 - 21 Aug 2024

Viewed by 839

Abstract

The discrete cosine transform (DCT) is widely used for image and video compression. Lossy algorithms such as JPEG, WebP, BPG and many others are based on it. Multiple modifications of DCT have been developed to improve its performance. One of them is adaptive [...] Read more.

The discrete cosine transform (DCT) is widely used for image and video compression. Lossy algorithms such as JPEG, WebP, BPG and many others are based on it. Multiple modifications of DCT have been developed to improve its performance. One of them is adaptive DCT (ADCT) designed to deal with heterogeneous image structure and it may be found, for example, in the HEVC video codec. Adaptivity means that the image is divided into an uneven grid of squares: smaller ones retain information about details better, while larger squares are efficient for homogeneous backgrounds. The practical use of adaptive DCT algorithms is complicated by the lack of optimal threshold search algorithms for image partitioning procedures. In this paper, we propose a novel method for optimal threshold search in ADCT using a metric based on tonal distribution. We define two thresholds: pm, the threshold defining solid mean coloring, and ps, defining the quadtree fragment splitting. In our algorithm, the values of these thresholds are calculated via polynomial functions of the tonal distribution of a particular image or fragment. The polynomial coefficients are determined using the dedicated optimization procedure on the dataset containing images from the specific domain, urban road scenes in our case. In the experimental part of the study, we show that ADCT allows a higher compression ratio compared to non-adaptive DCT at the same level of quality loss, up to 66% for acceptable quality. The proposed algorithm may be used directly for image compression, or as a core of video compression framework in traffic-demanding applications, such as urban video surveillance systems. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

16 pages, 10945 KiB

Open AccessArticle

Impact of Video Motion Content on HEVC Coding Efficiency

by Khalid A. M. Salih, Ismail Amin Ali and Ramadhan J. Mstafa

Computers 2024, 13(8), 204; https://doi.org/10.3390/computers13080204 - 18 Aug 2024

Viewed by 1376

Abstract

Digital video coding aims to reduce the bitrate and keep the integrity of visual presentation. High-Efficiency Video Coding (HEVC) can effectively compress video content to be suitable for delivery over various networks and platforms. Finding the optimal coding configuration is challenging as the [...] Read more.

Digital video coding aims to reduce the bitrate and keep the integrity of visual presentation. High-Efficiency Video Coding (HEVC) can effectively compress video content to be suitable for delivery over various networks and platforms. Finding the optimal coding configuration is challenging as the compression performance highly depends on the complexity of the encoded video sequence. This paper evaluates the effects of motion content on coding performance and suggests an adaptive encoding scheme based on the motion content of encoded video. To evaluate the effects of motion content on the compression performance of HEVC, we tested three coding configurations with different Group of Pictures (GOP) structures and intra refresh mechanisms. Namely, open GOP IPPP, open GOP Periodic-I, and closed GOP periodic-IDR coding structures were tested using several test sequences with a range of resolutions and motion activity. All sequences were first tested to check their motion activity. The rate–distortion curves were produced for all the test sequences and coding configurations. Our results show that the performance of IPPP coding configuration is significantly better (up to 4 dB) than periodic-I and periodic-IDR configurations for sequences with low motion activity. For test sequences with intermediate motion activity, IPPP configuration can still achieve a reasonable quality improvement over periodic-I and periodic-IDR configurations. However, for test sequences with high motion activity, IPPP configuration has a very small performance advantage over periodic-I and periodic-IDR configurations. Our results indicate the importance of selecting the appropriate coding structure according to the motion activity of the video being encoded. Full article

► Show Figures

Figure 1

12 pages, 1161 KiB

Open AccessEditor’s ChoiceArticle

Investigating the Hepatitis E Virus (HEV) Diversity in Rat Reservoirs from Northern Italy

by Luca De Sabato, Marina Monini, Roberta Galuppi, Filippo Maria Dini, Giovanni Ianiro, Gabriele Vaccari, Fabio Ostanello and Ilaria Di Bartolo

Pathogens 2024, 13(8), 633; https://doi.org/10.3390/pathogens13080633 - 29 Jul 2024

Cited by 1 | Viewed by 1615

Abstract

Hepatitis E virus belonging to the Rocahepevirus ratti species, genotype HEV-C1, has been extensively reported in rats in Europe, Asia and North America. Recently, human cases of hepatitis associated with HEV-C1 infection have been reported, but the zoonotic nature of rat-HEV remains controversial. [...] Read more.

Hepatitis E virus belonging to the Rocahepevirus ratti species, genotype HEV-C1, has been extensively reported in rats in Europe, Asia and North America. Recently, human cases of hepatitis associated with HEV-C1 infection have been reported, but the zoonotic nature of rat-HEV remains controversial. The transmission route of rat-HEV is unidentified and requires further investigation. The HEV strains of the Paslahepevirus balayani species, belonging to the same Hepeviridae family, and including the zoonotic genotype HEV-3 usually found in pigs, have also sporadically been identified in rats. We sampled 115 rats (liver, lung, feces) between 2020 and 2023 in Northeast Italy and the HEV detection was carried out by using Reverse Transcription PCR. HEV RNA was detected in 3/115 (2.6%) rats who tested positive for HEV-C1 strains in paired lung, intestinal contents and liver samples. Overall, none tested positive for the Paslahepevirus balayani strains. In conclusion, our results confirm the presence of HEV-rat in Italy with a prevalence similar to previous studies but show that there is a wide heterogeneity of strains in circulation. The detection of HEV-C1 genotype of Rocahepevirus ratti species in some human cases of acute hepatitis suggests that HEV-C1 may be an underestimated source of human infections. This finding, with the geographically widespread detection of HEV-C1 in rats, raises questions about the role of rats as hosts for both HEV-C1 and HEV-3 and the possibility of zoonotic transmission. Full article

► Show Figures

Figure 1

Figure 1
Phylogenetic analysis based on the 270 nt fragment of the partial RdRp region within ORF1 of the 9 sequences obtained in this study (entries highlighted in bold and indicated by black dots), 124 HEV-C1 sequences obtained from NBCI database by BLASTn searches, and two HEV-C2 sequences used as an outgroup. The maximum likelihood tree was produced using the TIM2 model (Transition model 2) with invariant sites and gamma distribution based on 1000 bootstrap replications and bootstraps values >70 indicated at their respective nodes. Sequence entries are reported as GenBank Accession Number, Country and Host species. On the right side, sequences belonging to G1–G3 group of HEV-C1 are indicated. Full article ">Figure 2
Alignment of amino acid sequences and secondary structure elements of HEV capsid proteins. The first line indicates the PDB number (2ZTN) of the capsid protein from the HEV-3 strain along with its secondary structure elements. On the left, the names of strains analyzed in the capsid proteins are indicated with accession numbers, genotypes, and subtypes. The capsid protein of the strains sequenced in this study are indicated by the names RatHEV119IT21 and RatHEV115IT21. Spiral lines indicate helices, while arrows represent β strands. White characters in red boxes represent strictly conserved residues and red characters represent stereochemically identical residues. Full article ">Figure 2 Cont.
Alignment of amino acid sequences and secondary structure elements of HEV capsid proteins. The first line indicates the PDB number (2ZTN) of the capsid protein from the HEV-3 strain along with its secondary structure elements. On the left, the names of strains analyzed in the capsid proteins are indicated with accession numbers, genotypes, and subtypes. The capsid protein of the strains sequenced in this study are indicated by the names RatHEV119IT21 and RatHEV115IT21. Spiral lines indicate helices, while arrows represent β strands. White characters in red boxes represent strictly conserved residues and red characters represent stereochemically identical residues. Full article ">

19 pages, 746 KiB

Open AccessArticle

Fast Depth Map Coding Algorithm for 3D-HEVC Based on Gradient Boosting Machine

by Xiaoke Su, Yaqiong Liu and Qiuwen Zhang

Electronics 2024, 13(13), 2586; https://doi.org/10.3390/electronics13132586 - 1 Jul 2024

Viewed by 1221

Abstract

Three-Dimensional High-Efficiency Video Coding (3D-HEVC) has been extensively researched due to its efficient compression and deep image representation, but encoding complexity continues to pose a difficulty. This is mainly attributed to redundancy in the coding unit (CU) recursive partitioning process and rate–distortion (RD) [...] Read more.

Three-Dimensional High-Efficiency Video Coding (3D-HEVC) has been extensively researched due to its efficient compression and deep image representation, but encoding complexity continues to pose a difficulty. This is mainly attributed to redundancy in the coding unit (CU) recursive partitioning process and rate–distortion (RD) cost calculation, resulting in a complex encoding process. Therefore, enhancing encoding efficiency and reducing redundant computations are key objectives for optimizing 3D-HEVC. This paper introduces a fast-encoding method for 3D-HEVC, comprising an adaptive CU partitioning algorithm and a rapid rate–distortion-optimization (RDO) algorithm. Based on the ALV features extracted from each coding unit, a Gradient Boosting Machine (GBM) model is constructed to obtain the corresponding CU thresholds. These thresholds are compared with the ALV to further decide whether to continue dividing the coding unit. The RDO algorithm is used to optimize the RD cost calculation process, selecting the optimal prediction mode as much as possible. The simulation results show that this method saves 52.49% of complexity while ensuring good video quality. Full article

► Show Figures

Figure 1

16 pages, 1739 KiB

Open AccessArticle

Light-Field Image Compression Based on a Two-Dimensional Prediction Coding Structure

by Jianrui Shao, Enjian Bai, Xueqin Jiang and Yun Wu

Information 2024, 15(6), 339; https://doi.org/10.3390/info15060339 - 7 Jun 2024

Cited by 1 | Viewed by 1277

Abstract

Light-field images (LFIs) are gaining increased attention within the field of 3D imaging, virtual reality, and digital refocusing, owing to their wealth of spatial and angular information. The escalating volume of LFI data poses challenges in terms of storage and transmission. To address [...] Read more.

Light-field images (LFIs) are gaining increased attention within the field of 3D imaging, virtual reality, and digital refocusing, owing to their wealth of spatial and angular information. The escalating volume of LFI data poses challenges in terms of storage and transmission. To address this problem, this paper introduces an MSHPE (most-similar hierarchical prediction encoding) structure based on light-field multi-view images. By systematically exploring the similarities among sub-views, our structure obtains residual views through the subtraction of the encoded view from its corresponding reference view. Regarding the encoding process, this paper implements a new encoding scheme to process all residual views, achieving lossless compression. High-efficiency video coding (HEVC) is applied to encode select residual views, thereby achieving lossy compression. Furthermore, the introduced structure is conceptualized as a layered coding scheme, enabling progressive transmission and showing good random access performance. Experimental results demonstrate the superior compression performance attained by encoding residual views according to the proposed structure, outperforming alternative structures. Notably, when HEVC is employed for encoding residual views, significant bit savings are observed compared to the direct encoding of original views. The final restored view presents better detail quality, reinforcing the effectiveness of this approach. Full article

► Show Figures

Figure 1

15 pages, 4633 KiB

Open AccessArticle

Faster Intra-Prediction of Versatile Video Coding Using a Concatenate-Designed CNN via DCT Coefficients

by Sio-Kei Im and Ka-Hou Chan

Electronics 2024, 13(11), 2214; https://doi.org/10.3390/electronics13112214 - 6 Jun 2024

Cited by 1 | Viewed by 865

Abstract

As the next generation video coding standard, Versatile Video Coding (VVC) significantly improves coding efficiency over the current High-Efficiency Video Coding (HEVC) standard. In practice, this improvement comes at the cost of increased pre-processing complexity. This increased complexity faces the challenge of implementing [...] Read more.

As the next generation video coding standard, Versatile Video Coding (VVC) significantly improves coding efficiency over the current High-Efficiency Video Coding (HEVC) standard. In practice, this improvement comes at the cost of increased pre-processing complexity. This increased complexity faces the challenge of implementing VVC for time-consuming encoding. This work presents a technique to simplify VVC intra-prediction using Discrete Cosine Transform (DCT) feature analysis and a concatenate-designed CNN. The coefficients of the (DTC-)transformed CUs reflect the complexity of the original texture, and the proposed CNN employs multiple classifiers to predict whether they should be split. This approach can determine whether to split Coding Units (CUs) of different sizes according to the Versatile Video Coding (VVC) standard. This helps to simplify the intra-prediction process. The experimental results indicate that our approach can reduce the encoding time by 52.77% with a minimal increase of 1.48%. We use the Bjøntegaard Delta Bit Rate (BDBR) compared to the original algorithm, demonstrating a competitive result with other state-of-the-art methods in terms of coding efficiency with video quality. Full article

(This article belongs to the Special Issue Image and Video Processing Based on Deep Learning)

► Show Figures

Figure 1

25 pages, 940 KiB

Open AccessArticle

Fast Versatile Video Coding (VVC) Intra Coding for Power-Constrained Applications

by Lei Chen, Baoping Cheng, Haotian Zhu, Haowen Qin, Lihua Deng and Lei Luo

Electronics 2024, 13(11), 2150; https://doi.org/10.3390/electronics13112150 - 31 May 2024

Cited by 6 | Viewed by 1303

Abstract

Versatile Video Coding (VVC) achieves impressive coding gain improvement (about 40%+) over the preceding High-Efficiency Video Coding (HEVC) technology at the cost of extremely high computational complexity. Such an extremely high complexity increase is a great challenge for power-constrained applications, such as Internet [...] Read more.

Versatile Video Coding (VVC) achieves impressive coding gain improvement (about 40%+) over the preceding High-Efficiency Video Coding (HEVC) technology at the cost of extremely high computational complexity. Such an extremely high complexity increase is a great challenge for power-constrained applications, such as Internet of video things. In the case of intra coding, VVC utilizes the brute-force recursive search for both the partition structure of the coding unit (CU), which is based on the quadtree with nested multi-type tree (QTMT), and 67 intra prediction modes, compared to 35 in HEVC. As a result, we offer optimization strategies for CU partition decision and intra coding modes to lessen the computational overhead. Regarding the high complexity of the CU partition process, first, CUs are categorized as simple, fuzzy, and complex based on their texture characteristics. Then, we train two random forest classifiers to speed up the RDO-based brute-force recursive search process. One of the classifiers directly predicts the optimal partition modes for simple and complex CUs, while another classifier determines the early termination of the partition process for fuzzy CUs. Meanwhile, to reduce the complexity of intra mode prediction, a fast hierarchical intra mode search method is designed based on the texture features of CUs, including texture complexity, texture direction, and texture context information. Extensive experimental findings demonstrate that the proposed approach reduces complexity by up to 77% compared to the latest VVC reference software (VTM-23.1). Additionally, an average coding time saving of 70% is achieved with only a 1.65% increase in BDBR. Furthermore, when compared to state-of-the-art methods, the proposed method also achieves the largest time saving with comparable BDBR loss. These findings indicate that our method is superior to other up-to-date methods in terms of lowering VVC intra coding complexity, which provides an elective solution for power-constrained applications. Full article

(This article belongs to the Special Issue Advances in Image Processing and Computer Vision Based on Machine Learning)

► Show Figures

Figure 1

Search Results (161)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (161)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI