[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (160)

Search Parameters:
Keywords = HEVC

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 433 KiB  
Article
A Fast Coding Unit Partitioning Decision Algorithm for Versatile Video Coding Based on Gradient Feedback Hierarchical Convolutional Neural Network and Light Gradient Boosting Machine Decision Tree
by Fangmei Liu, Jiyuan Wang and Qiuwen Zhang
Electronics 2024, 13(24), 4908; https://doi.org/10.3390/electronics13244908 - 12 Dec 2024
Viewed by 219
Abstract
Video encoding technology is a foundational component in the advancement of modern technological applications. The latest standard in universal video coding, H.266/VVC, features a quad-tree with nested multi-type tree (QTMT) partitioning structure, which represents an improvement over its predecessor, High-Efficiency Video Coding (H.265/HEVC). [...] Read more.
Video encoding technology is a foundational component in the advancement of modern technological applications. The latest standard in universal video coding, H.266/VVC, features a quad-tree with nested multi-type tree (QTMT) partitioning structure, which represents an improvement over its predecessor, High-Efficiency Video Coding (H.265/HEVC). This configuration facilitates adaptable block segmentation, albeit at the cost of heightened encoding complexity. In view of the aforementioned considerations, this paper puts forth a deep learning-based approach to facilitate CU partitioning, with the aim of supplanting the intricate CU partitioning process observed in the Versatile Video Coding Test Model (VTM). We begin by presenting the Gradient Feedback Hierarchical CNN (GFH-CNN) model, an advanced convolutional neural network derived from the ResNet architecture, enabling the extraction of features from 64 × 64 coding unit (CU) blocks. Following this, a hierarchical network diagram (HND) is crafted to depict the delineation of partition boundaries corresponding to the various levels of the CU block’s layered structure. This diagram maps the features extracted by the GFH-CNN model to the partitioning at each level and boundary. In conclusion, a LightGBM-based decision tree classification model (L-DT) is constructed to predict the corresponding partition structure based on the prediction vector output from the GFH-CNN model. Subsequently, any errors in the partitioning results are corrected in accordance with the encoding constraints specified by the VTM, which ultimately determines the final CU block partitioning. The experimental results demonstrate that, in comparison with VTM-10.0, the proposed algorithm achieves a 48.14% reduction in complexity with only a 0.83% increase in bitrate under the top-three configuration, which is negligible. In comparison, the top-two configuration resulted in a higher complexity reduction of 63.78%, although this was accompanied by a 2.08% increase in bitrate. These results demonstrate that, in comparison to existing solutions, our approach provides an optimal balance between encoding efficiency and computational complexity. Full article
Show Figures

Figure 1

Figure 1
<p>Algorithm flowchart of the GFH-CNN+L-DT model framework.</p>
Full article ">Figure 2
<p>CTU partitioning in VVC. (<b>a</b>) VVC split types. (<b>b</b>) Example of CTU partitioning in MTT.</p>
Full article ">Figure 3
<p>CTU partitioning in VVC. (<b>a</b>) VVC split types. (<b>b</b>) Schematic diagram of different levels of HND.</p>
Full article ">Figure 4
<p>The proposed GFH-CNN model framework takes brightness information as input and outputs probability vectors.</p>
Full article ">Figure 5
<p>Loss and accuracy rate of the GFH-CNN model.</p>
Full article ">Figure 6
<p>Comparison of our algorithm with various algorithms [<a href="#B19-electronics-13-04908" class="html-bibr">19</a>,<a href="#B36-electronics-13-04908" class="html-bibr">36</a>,<a href="#B37-electronics-13-04908" class="html-bibr">37</a>].</p>
Full article ">
16 pages, 10696 KiB  
Article
A Framework for Symmetric-Quality S3D Video Streaming Services
by Juhyeon Lee, Seungjun Lee, Sunghoon Kim and Dongwook Kang
Appl. Sci. 2024, 14(23), 11011; https://doi.org/10.3390/app142311011 - 27 Nov 2024
Viewed by 339
Abstract
This paper proposes an efficient encoding framework based on Scalable High Efficiency Video Coding (SHVC) technology, which supports both low- and high-resolution 2D videos as well as stereo 3D (S3D) video simultaneously. Previous studies have introduced Cross-View SHVC, which encodes two videos with [...] Read more.
This paper proposes an efficient encoding framework based on Scalable High Efficiency Video Coding (SHVC) technology, which supports both low- and high-resolution 2D videos as well as stereo 3D (S3D) video simultaneously. Previous studies have introduced Cross-View SHVC, which encodes two videos with different viewpoints and resolutions using a Cross-View SHVC encoder, where the low-resolution video is encoded as the base layer and the other video as the enhancement layer. This encoder provides resolution diversity and allows the decoder to combine the two videos, enabling 3D video services. Even when 3D videos are composed of left and right videos with different resolutions, the viewer tends to perceive the quality based on the higher-resolution video due to the binocular suppression effect, where the brain prioritizes the high-quality image and suppresses the lower-quality one. However, recent experiments have shown that when the disparity between resolutions exceeds a certain threshold, it can lead to a subjective degradation of the perceived 3D video quality. To address this issue, a conditional replenishment algorithm has been studied, which replaces some blocks of the video using a disparity-compensated left-view image based on rate–distortion cost. This conditional replenishment algorithm (also known as VEI technology) effectively reduces the quality difference between the base layer and enhancement layer videos. However, the algorithm alone cannot fully compensate for the quality difference between the left and right videos. In this paper, we propose a novel encoding framework to solve the asymmetry issue between the left and right videos in 3D video services and achieve symmetrical video quality. The proposed framework focuses on improving the quality of the right-view video by combining the conditional replenishment algorithm with Cross-View SHVC. Specifically, the framework leverages the non-HEVC option of the SHVC encoder, using a VEI (Video Enhancement Information) restored image as the base layer to provide higher-quality prediction signals and reduce encoding complexity. Experimental results using animation and live-action UHD sequences show that the proposed method achieves BD-RATE reductions of 57.78% and 45.10% compared with HEVC and SHVC codecs, respectively. Full article
Show Figures

Figure 1

Figure 1
<p>SHVC encoder.</p>
Full article ">Figure 2
<p>Cross-View SHVC encoder for the hybrid 3DTV system.</p>
Full article ">Figure 3
<p>Cross-View SHVC decoder for the hybrid 3DTV system.</p>
Full article ">Figure 4
<p>Improved Cross-View SHVC.</p>
Full article ">Figure 5
<p>Conditional replenishment algorithm.</p>
Full article ">Figure 6
<p>Conditional replenishment algorithm application results.</p>
Full article ">Figure 7
<p>VEI stream construction method.</p>
Full article ">Figure 8
<p>SHVC codec incorporated with VEI.</p>
Full article ">Figure 9
<p>Proposed encoding framework schematic diagram.</p>
Full article ">Figure 10
<p>Proposed decoding framework schematic diagram.</p>
Full article ">Figure 11
<p>Sequence snapshots used in the experiment.</p>
Full article ">Figure 12
<p>RD curve.</p>
Full article ">
18 pages, 4103 KiB  
Article
Content-Adaptive Bitrate Ladder Estimation in High-Efficiency Video Coding Utilizing Spatiotemporal Resolutions
by Jelena Šuljug and Snježana Rimac-Drlje
Electronics 2024, 13(20), 4049; https://doi.org/10.3390/electronics13204049 - 15 Oct 2024
Viewed by 610
Abstract
The constant increase in multimedia Internet traffic in the form of video streaming requires new solutions for efficient video coding to save bandwidth and network resources. HTTP adaptive streaming (HAS), the most widely used solution for video streaming, allows the client to adaptively [...] Read more.
The constant increase in multimedia Internet traffic in the form of video streaming requires new solutions for efficient video coding to save bandwidth and network resources. HTTP adaptive streaming (HAS), the most widely used solution for video streaming, allows the client to adaptively select the bitrate according to the transmission conditions. For this purpose, multiple presentations of the same video content are generated on the video server, which contains video sequences encoded at different bitrates with resolution adjustment to achieve the best Quality of Experience (QoE). This set of bitrate–resolution pairs is called a bitrate ladder. In addition to the traditional one-size-fits-all scheme for the bitrate ladder, context-aware solutions have recently been proposed that enable optimum bitrate–resolution pairs for video sequences of different complexity. However, these solutions use only spatial resolution for optimization, while the selection of the optimal combination of spatial and temporal resolution for a given bitrate has not been sufficiently investigated. This paper proposes bit-ladder optimization considering spatiotemporal features of video sequences and usage of optimal spatial and temporal resolution related to video content complexity. Optimization along two dimensions of resolution significantly increases the complexity of the problem and the approach of intensive encoding for all spatial and temporal resolutions in a wide range of bitrates, for each video sequence, is not feasible in real time. In order to reduce the level of complexity, we propose a data augmentation using a neural network (NN)-based model. To train the NN model, we used seven video sequences of different content complexity, encoded with the HEVC encoder at five different spatial resolutions (SR) up to 4K. Also, all video sequences were encoded using four frame rates up to 120 fps, presenting different temporal resolutions (TR). The Structural Similarity Index Measure (SSIM) is used as an objective video quality metric. After data augmentation, we propose NN models that estimate optimal TR and bitrate values as switching points to a higher SR. These results can be further used as input parameters for the bitrate ladder construction for video sequences of a certain complexity. Full article
Show Figures

Figure 1

Figure 1
<p>Spatial and temporal information of video sequences used for model development and evaluation.</p>
Full article ">Figure 2
<p>Achieved bitrate quality curves for video sequence Beauty.</p>
Full article ">Figure 3
<p>Flowchart of video coding and NN training process.</p>
Full article ">Figure 4
<p>The underlying NN architecture.</p>
Full article ">Figure 5
<p>(<b>a</b>) Regression plot for trained NN; (<b>b</b>) training state plot.</p>
Full article ">Figure 6
<p>(<b>a</b>) Error histogram plot for trained NN; (<b>b</b>) performance plot.</p>
Full article ">Figure 7
<p>(<b>a</b>) Regression plot for trained NN<sub>TR</sub>; (<b>b</b>) training state plot.</p>
Full article ">Figure 8
<p>(<b>a</b>) Error histogram plot for trained NN<sub>TR</sub>; (<b>b</b>) performance plot.</p>
Full article ">Figure 9
<p>(<b>a</b>) Regression plot for trained NN<sub>BR</sub>; (<b>b</b>) training state plot.</p>
Full article ">Figure 10
<p>(<b>a</b>) Error histogram plot for trained NN<sub>BR</sub>; (<b>b</b>) performance plot.</p>
Full article ">
12 pages, 1792 KiB  
Article
Information Bottleneck Driven Deep Video Compression—IBOpenDVCW
by Timor Leiderman and Yosef Ben Ezra
Entropy 2024, 26(10), 836; https://doi.org/10.3390/e26100836 - 30 Sep 2024
Viewed by 817
Abstract
Video compression remains a challenging task despite significant advancements in end-to-end optimized deep networks for video coding. This study, inspired by information bottleneck (IB) theory, introduces a novel approach that combines IB theory with wavelet transform. We perform a comprehensive analysis of information [...] Read more.
Video compression remains a challenging task despite significant advancements in end-to-end optimized deep networks for video coding. This study, inspired by information bottleneck (IB) theory, introduces a novel approach that combines IB theory with wavelet transform. We perform a comprehensive analysis of information and mutual information across various mother wavelets and decomposition levels. Additionally, we replace the conventional average pooling layers with a discrete wavelet transform creating more advanced pooling methods to investigate their effects on information and mutual information. Our results demonstrate that the proposed model and training technique outperform existing state-of-the-art video compression methods, delivering competitive rate-distortion performance compared to the AVC/H.264 and HEVC/H.265 codecs. Full article
(This article belongs to the Section Information Theory, Probability and Statistics)
Show Figures

Figure 1

Figure 1
<p>High-Level framework of the OpenDVCW Network.</p>
Full article ">Figure 2
<p>Pyramid architecture of the optical flow estimation.</p>
Full article ">Figure 3
<p>Visualization of DWT as we apply the transform on the approximation on every iteration.</p>
Full article ">Figure 4
<p>Calculated information on Lenna image for various mother wavelets.</p>
Full article ">Figure 5
<p>Performance on the UVG dataset comparison between AVC/H.264, HEVC/H.265, VVC/H.266 and OpenDVCW with Db2, Sym3 and Haar wavelets for the DWT transform in the optical flow. (<b>A</b>)—Beauty, (<b>B</b>)—HoneyBee, (<b>C</b>)—ShakeNDry, (<b>D</b>)—Bosphorus.</p>
Full article ">Figure 5 Cont.
<p>Performance on the UVG dataset comparison between AVC/H.264, HEVC/H.265, VVC/H.266 and OpenDVCW with Db2, Sym3 and Haar wavelets for the DWT transform in the optical flow. (<b>A</b>)—Beauty, (<b>B</b>)—HoneyBee, (<b>C</b>)—ShakeNDry, (<b>D</b>)—Bosphorus.</p>
Full article ">
24 pages, 6380 KiB  
Article
Multi-Type Self-Attention-Based Convolutional-Neural-Network Post-Filtering for AV1 Codec
by Woowoen Gwun, Kiho Choi and Gwang Hoon Park
Mathematics 2024, 12(18), 2874; https://doi.org/10.3390/math12182874 - 15 Sep 2024
Viewed by 789
Abstract
Over the past few years, there has been substantial interest and research activity surrounding the application of Convolutional Neural Networks (CNNs) for post-filtering in video coding. Most current research efforts have focused on using CNNs with various kernel sizes for post-filtering, primarily concentrating [...] Read more.
Over the past few years, there has been substantial interest and research activity surrounding the application of Convolutional Neural Networks (CNNs) for post-filtering in video coding. Most current research efforts have focused on using CNNs with various kernel sizes for post-filtering, primarily concentrating on High-Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC). This narrow focus has limited the exploration and application of these techniques to other video coding standards such as AV1, developed by the Alliance for Open Media, which offers excellent compression efficiency, reducing bandwidth usage and improving video quality, making it highly attractive for modern streaming and media applications. This paper introduces a novel approach that extends beyond traditional CNN methods by integrating three different self-attention layers into the CNN framework. Applied to the AV1 codec, the proposed method significantly improves video quality by incorporating these distinct self-attention layers. This enhancement demonstrates the potential of self-attention mechanisms to revolutionize post-filtering techniques in video coding beyond the limitations of convolution-based methods. The experimental results show that the proposed network achieves an average BD-rate reduction of 10.40% for the Luma component and 19.22% and 16.52% for the Chroma components compared to the AV1 anchor. Visual quality assessments further validated the effectiveness of our approach, showcasing substantial artifact reduction and detail enhancement in videos. Full article
(This article belongs to the Special Issue New Advances and Applications in Image Processing and Computer Vision)
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Illustration showing where in-loop filter is located in video codec pipeline; (<b>b</b>) illustration showing where post-filter is located in pipeline.</p>
Full article ">Figure 2
<p>Proposed MTSA-based CNN.</p>
Full article ">Figure 3
<p>(<b>a</b>) RCB; (<b>b</b>) CWSA.</p>
Full article ">Figure 4
<p>(<b>a</b>) Simplified feature map with channel size of 3 and height and width sizes of 4; (<b>b</b>) feature map unfolded into smaller blocks; (<b>c</b>) feature map permuted and reshaped.</p>
Full article ">Figure 5
<p>(<b>a</b>) BWSSA; (<b>b</b>) PWSA.</p>
Full article ">Figure 6
<p>R-D curves by SVT-AV1 and MTSA. (<b>a</b>) class A1; (<b>b</b>) class A2; (<b>c</b>) class A3; (<b>d</b>) class A4; (<b>e</b>) class A5.</p>
Full article ">Figure 7
<p>Example sequence of Class A1 PierSeaSide. (<b>a</b>) Original image from the AVM-CTC sequence; (<b>b</b>) detail inside the yellow box from (<b>a</b>) in the original image; (<b>c</b>) detail inside the yellow box from (<b>a</b>) in the compressed image using SVT-AV1 with QP55; (<b>d</b>) detail inside the yellow box from (<b>a</b>) after applying the post-filter using the proposed network.</p>
Full article ">Figure 8
<p>Example sequence of Class A1 Tango. (<b>a</b>) Original image from the AVM-CTC sequence; (<b>b</b>) detail inside the yellow box from (<b>a</b>) in the original image; (<b>c</b>) detail inside the yellow box from (<b>a</b>) in the compressed image using SVT-AV1 with QP55; (<b>d</b>) detail inside the yellow box from (<b>a</b>) after applying the post-filter using the proposed network.</p>
Full article ">Figure 9
<p>Example sequence of Class A2 RushFieldCuts. (<b>a</b>) Original image from the AVM-CTC sequence; (<b>b</b>) detail inside the yellow box from (<b>a</b>) in the original image; (<b>c</b>) detail inside the yellow box from (<b>a</b>) in the compressed image using SVT-AV1 with QP43; (<b>d</b>) detail inside the yellow box from (<b>a</b>) after applying the post-filter using the proposed network.</p>
Full article ">Figure 10
<p>Methods to handle empty spaces for edge patches; (<b>a</b>) empty spaces filled with zero value; (<b>b</b>) empty spaces filled with edge pixel value extended.</p>
Full article ">Figure 11
<p>Network wrongly turning edge pixel into darker value; (<b>a</b>) pixel value difference between the original video frame and the AV1-encoded frame; (<b>b</b>) pixel value difference between the original video frame and the AV1-encoded frame processed by the proposed network, with larger positive pixel differences in Y indicating that the processed frame is darker, at the bottom of the image.</p>
Full article ">
26 pages, 7340 KiB  
Article
Versatile Video Coding-Post Processing Feature Fusion: A Post-Processing Convolutional Neural Network with Progressive Feature Fusion for Efficient Video Enhancement
by Tanni Das, Xilong Liang and Kiho Choi
Appl. Sci. 2024, 14(18), 8276; https://doi.org/10.3390/app14188276 - 13 Sep 2024
Viewed by 1006
Abstract
Advanced video codecs such as High Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC) are vital for streaming high-quality online video content, as they compress and transmit data efficiently. However, these codecs can occasionally degrade video quality by adding undesirable artifacts such [...] Read more.
Advanced video codecs such as High Efficiency Video Coding/H.265 (HEVC) and Versatile Video Coding/H.266 (VVC) are vital for streaming high-quality online video content, as they compress and transmit data efficiently. However, these codecs can occasionally degrade video quality by adding undesirable artifacts such as blockiness, blurriness, and ringing, which can detract from the viewer’s experience. To ensure a seamless and engaging video experience, it is essential to remove these artifacts, which improves viewer comfort and engagement. In this paper, we propose a deep feature fusion based convolutional neural network (CNN) architecture (VVC-PPFF) for post-processing approach to further enhance the performance of VVC. The proposed network, VVC-PPFF, harnesses the power of CNNs to enhance decoded frames, significantly improving the coding efficiency of the state-of-the-art VVC video coding standard. By combining deep features from early and later convolution layers, the network learns to extract both low-level and high-level features, resulting in more generalized outputs that adapt to different quantization parameter (QP) values. The proposed VVC-PPFF network achieves outstanding performance, with Bjøntegaard Delta Rate (BD-Rate) improvements of 5.81% and 6.98% for luma components in random access (RA) and low-delay (LD) configurations, respectively, while also boosting peak signal-to-noise ratio (PSNR). Full article
Show Figures

Figure 1

Figure 1
<p>Enhancing video quality with CNN based post-processing in conventional VVC coding workflow.</p>
Full article ">Figure 2
<p>MP4 to YUV conversion and reconstruction using VVenC and VVdeC.</p>
Full article ">Figure 3
<p>Illustration of video-to-image conversion process: (<b>a</b>) original videos converted to original images using FFmpeg, and (<b>b</b>) reconstructed videos converted to reconstructed images using FFmpeg.</p>
Full article ">Figure 4
<p>Illustration of the conversion process from YUV 4:2:0 format to YUV 4:4:4 format before feeding data into the deep learning network.</p>
Full article ">Figure 5
<p>Illustration of down-sampling process of neural network output from YUV 4:4:4 to YUV 4:2:0 format.</p>
Full article ">Figure 6
<p>Architecture of the proposed CNN-based post-filtering method, integrating multiple feature extractions for enhanced output refinement.</p>
Full article ">Figure 7
<p>Comparative visualization of (<b>b</b>) reconstructed frames from anchor VVC and (<b>c</b>) proposed methods for DaylightRoad2 sequence at QP 42 for RA configuration, alongside the (<b>a</b>) original uncompressed reference frame.</p>
Full article ">Figure 8
<p>Comparative visualization of (<b>b</b>) reconstructed frames from anchor VVC and (<b>c</b>) proposed methods for FourPeople sequence at QP 42 for LD configuration, alongside the (<b>a</b>) original uncompressed reference frame.</p>
Full article ">Figure 9
<p>RD curve performance comparison for five different test sequences in RA configuration.</p>
Full article ">Figure 10
<p>RD curve performance comparison for four different test sequences in LD configuration.</p>
Full article ">Figure 11
<p>Visual quality comparison of proposed method with 8 feature extraction blocks for RA and LD scenarios at QP 42: (<b>a</b>) MarketPlace Sequence and (<b>b</b>) PartyScene Sequence.</p>
Full article ">Figure 12
<p>Visual quality comparison of proposed method with 12 feature extraction blocks for RA and LD scenarios at QP 42: (<b>a</b>) RitualDance Sequence and (<b>b</b>) Cactus Sequence.</p>
Full article ">
34 pages, 2908 KiB  
Article
A Hybrid Contrast and Texture Masking Model to Boost High Efficiency Video Coding Perceptual Rate-Distortion Performance
by Javier Ruiz Atencia, Otoniel López-Granado, Manuel Pérez Malumbres, Miguel Martínez-Rach, Damian Ruiz Coll, Gerardo Fernández Escribano and Glenn Van Wallendael
Electronics 2024, 13(16), 3341; https://doi.org/10.3390/electronics13163341 - 22 Aug 2024
Viewed by 623
Abstract
As most of the videos are destined for human perception, many techniques have been designed to improve video coding based on how the human visual system perceives video quality. In this paper, we propose the use of two perceptual coding techniques, namely contrast [...] Read more.
As most of the videos are destined for human perception, many techniques have been designed to improve video coding based on how the human visual system perceives video quality. In this paper, we propose the use of two perceptual coding techniques, namely contrast masking and texture masking, jointly operating under the High Efficiency Video Coding (HEVC) standard. These techniques aim to improve the subjective quality of the reconstructed video at the same bit rate. For contrast masking, we propose the use of a dedicated weighting matrix for each block size (from 4×4 up to 32×32), unlike the HEVC standard, which only defines an 8×8 weighting matrix which it is upscaled to build the 16×16 and 32×32 weighting matrices (a 4×4 weighting matrix is not supported). Our approach achieves average Bjøntegaard Delta-Rate (BD-rate) gains of between 2.5% and 4.48%, depending on the perceptual metric and coding mode used. On the other hand, we propose a novel texture masking scheme based on the classification of each coding unit to provide an over-quantization depending on the coding unit texture level. Thus, for each coding unit, its mean directional variance features are computed to feed a support vector machine model that properly predicts the texture type (plane, edge, or texture). According to this classification, the block’s energy, the type of coding unit, and its size, an over-quantization value is computed as a QP offset (DQP) to be applied to this coding unit. By applying both techniques in the HEVC reference software, an overall average of 5.79% BD-rate gain is achieved proving their complementarity. Full article
(This article belongs to the Special Issue Recent Advances in Image/Video Compression and Coding)
Show Figures

Figure 1

Figure 1
<p>Default HEVC quantization weighting matrices.</p>
Full article ">Figure 2
<p>Contrast sensitivity function. The red curve represents the original CSF as defined by Equation (1), while the blue dashed curve represents the flattened CSF, with spatial frequencies below the peak sensitivity saturated.</p>
Full article ">Figure 3
<p>Proposed 4 × 4 quantization weighting matrices for intra- and interprediction modes.</p>
Full article ">Figure 4
<p>Rate-distortion curves comparing our proposed CSF with the default implemented in the HEVC standard using different perceptual metrics. (<b>a</b>,<b>b</b>) correspond to the BQTerrace sequence of class B, while (<b>c</b>,<b>d</b>) correspond to the ChinaSpeed sequence of class F.</p>
Full article ">Figure 5
<p>Samples of manually classified blocks (left-hand side) and their associated polar diagram of the MDV metric (right-hand side). From top to bottom: <math display="inline"><semantics> <mrow> <mn>8</mn> <mo>×</mo> <mn>8</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mn>16</mn> <mo>×</mo> <mn>16</mn> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <mn>32</mn> <mo>×</mo> <mn>32</mn> </mrow> </semantics></math> block sizes; from left- to right-hand side: plain, edge, and texture blocks.</p>
Full article ">Figure 6
<p>(<b>a</b>) Scatter plot of manually classified <math display="inline"><semantics> <mrow> <mn>16</mn> <mo>×</mo> <mn>16</mn> </mrow> </semantics></math> blocks (training dataset), and (<b>b</b>) the classification results provided by the trained SVM model (testing dataset).</p>
Full article ">Figure 7
<p>Example of block classification for the first frame of sequence BasketballDrill, using optimal SVM models for each block size.</p>
Full article ">Figure 8
<p>Box and whisker plot of the block energy (<math display="inline"><semantics> <mi>ε</mi> </semantics></math>) distribution by size and texture classification.</p>
Full article ">Figure 9
<p>Representation of Equation (6) for two sets of function parameter, (<b>red</b>) <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>i</mi> <mi>n</mi> <msub> <mi>E</mi> <mn>1</mn> </msub> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>a</mi> <mi>x</mi> <msub> <mi>E</mi> <mn>1</mn> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>a</mi> <mi>x</mi> <mi>Q</mi> <mi>S</mi> <mi>t</mi> <mi>e</mi> <msub> <mi>p</mi> <mn>1</mn> </msub> </mrow> </semantics></math> and (<b>blue</b>) <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>i</mi> <mi>n</mi> <msub> <mi>E</mi> <mn>2</mn> </msub> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>a</mi> <mi>x</mi> <msub> <mi>E</mi> <mn>2</mn> </msub> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>a</mi> <mi>x</mi> <mi>Q</mi> <mi>S</mi> <mi>t</mi> <mi>e</mi> <msub> <mi>p</mi> <mn>2</mn> </msub> </mrow> </semantics></math>. <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>Q</mi> <mi>S</mi> <mi>t</mi> <mi>e</mi> <msub> <mi>p</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> </mrow> </semantics></math> is different for each set.</p>
Full article ">Figure 10
<p>Flowchart of candidate selection for brute-force analysis of perceptually optimal parameters. The Ps in energy range boxes refer to the percentile.</p>
Full article ">Figure 11
<p>BD-rate curves (MS-SSIM metric) for PeopleOnStreet video test sequence over the <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>a</mi> <mi>x</mi> <mi>Q</mi> <mi>S</mi> <mi>t</mi> <mi>e</mi> <mi>p</mi> </mrow> </semantics></math> parameter when modifying texture blocks of size 8. Each curve represents a different block energy range (<math display="inline"><semantics> <mrow> <mi>M</mi> <mi>i</mi> <mi>n</mi> <mi>E</mi> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>a</mi> <mi>x</mi> <mi>E</mi> </mrow> </semantics></math>).</p>
Full article ">Figure 12
<p>Rate-distortion curves of the first frame of the BQSquare sequence, comparing our proposed contrast masking (red line) and contrast and texture masking (yellow line) with the HM reference coding (blue line), using the (<b>a</b>) SSIM, (<b>b</b>) MS-SSIM, and (<b>c</b>) PSNR-HVS-M perceptual metrics.</p>
Full article ">Figure 13
<p>Visual comparison of the first frame of the BQSquare sequence encoded at <math display="inline"><semantics> <mrow> <mi>Q</mi> <mi>P</mi> <mo>=</mo> <mn>22</mn> </mrow> </semantics></math>. (<b>a</b>) HM reference-encoded frame; (<b>b</b>) frame encoded with contrast and texture masking.</p>
Full article ">Figure 13 Cont.
<p>Visual comparison of the first frame of the BQSquare sequence encoded at <math display="inline"><semantics> <mrow> <mi>Q</mi> <mi>P</mi> <mo>=</mo> <mn>22</mn> </mrow> </semantics></math>. (<b>a</b>) HM reference-encoded frame; (<b>b</b>) frame encoded with contrast and texture masking.</p>
Full article ">Figure A1
<p>Traffic 2560 × 1600 30 fps Class A.</p>
Full article ">Figure A2
<p>PeopleOnStreet 2560 × 1600 30 fps Class A.</p>
Full article ">Figure A3
<p>NebutaFestival 2560 × 1600 60 fps Class A.</p>
Full article ">Figure A4
<p>SteamLocomotiveTrain 2560 × 1600 60 fps Class A.</p>
Full article ">Figure A5
<p>Kimono 1920 × 1080 24 fps Class B.</p>
Full article ">Figure A6
<p>ParkScene 1920 × 1080 24 fps Class B.</p>
Full article ">Figure A7
<p>Cactus 1920 × 1080 50 fps Class B.</p>
Full article ">Figure A8
<p>BQTerrace 1920 × 1080 60 fps Class B.</p>
Full article ">Figure A9
<p>BasketballDrive 1920 × 1080 50 fps Class B.</p>
Full article ">Figure A10
<p>RaceHorses 832 × 480 30 fps Class C.</p>
Full article ">Figure A11
<p>BQMall 832 × 480 60 fps Class C.</p>
Full article ">Figure A12
<p>PartyScene 832 × 480 50 fps Class C.</p>
Full article ">Figure A13
<p>BasketballDrill 832 × 480 50 fps Class C.</p>
Full article ">Figure A14
<p>RaceHorses 416 × 240 30 fps Class D.</p>
Full article ">Figure A15
<p>BQSquare 416 × 240 60 fps Class D.</p>
Full article ">Figure A16
<p>BlowingBubbles 416 × 240 50 fps Class D.</p>
Full article ">Figure A17
<p>BasketballPass 416 × 240 50 fps Class D.</p>
Full article ">Figure A18
<p>FourPeople 1280 × 720 60 fps Class E.</p>
Full article ">Figure A19
<p>Johnny 1280 × 720 60 fps Class E.</p>
Full article ">Figure A20
<p>KristenAndSara 1280 × 720 60 fps Class E.</p>
Full article ">Figure A21
<p>BasketballDrillText 832 × 480 50 fps Class F.</p>
Full article ">Figure A22
<p>ChinaSpeed 1024 × 768 30 fps Class F.</p>
Full article ">Figure A23
<p>SlideEditing 1280 × 720 30 fps Class F.</p>
Full article ">Figure A24
<p>SlideShow 1280 × 720 20 fps Class F.</p>
Full article ">
19 pages, 7973 KiB  
Article
Determining Thresholds for Optimal Adaptive Discrete Cosine Transformation
by Alexander Khanov, Anastasija Shulzhenko, Anzhelika Voroshilova, Alexander Zubarev, Timur Karimov and Shakeeb Fahmi
Algorithms 2024, 17(8), 366; https://doi.org/10.3390/a17080366 - 21 Aug 2024
Viewed by 627
Abstract
The discrete cosine transform (DCT) is widely used for image and video compression. Lossy algorithms such as JPEG, WebP, BPG and many others are based on it. Multiple modifications of DCT have been developed to improve its performance. One of them is adaptive [...] Read more.
The discrete cosine transform (DCT) is widely used for image and video compression. Lossy algorithms such as JPEG, WebP, BPG and many others are based on it. Multiple modifications of DCT have been developed to improve its performance. One of them is adaptive DCT (ADCT) designed to deal with heterogeneous image structure and it may be found, for example, in the HEVC video codec. Adaptivity means that the image is divided into an uneven grid of squares: smaller ones retain information about details better, while larger squares are efficient for homogeneous backgrounds. The practical use of adaptive DCT algorithms is complicated by the lack of optimal threshold search algorithms for image partitioning procedures. In this paper, we propose a novel method for optimal threshold search in ADCT using a metric based on tonal distribution. We define two thresholds: pm, the threshold defining solid mean coloring, and ps, defining the quadtree fragment splitting. In our algorithm, the values of these thresholds are calculated via polynomial functions of the tonal distribution of a particular image or fragment. The polynomial coefficients are determined using the dedicated optimization procedure on the dataset containing images from the specific domain, urban road scenes in our case. In the experimental part of the study, we show that ADCT allows a higher compression ratio compared to non-adaptive DCT at the same level of quality loss, up to 66% for acceptable quality. The proposed algorithm may be used directly for image compression, or as a core of video compression framework in traffic-demanding applications, such as urban video surveillance systems. Full article
(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)
Show Figures

Figure 1

Figure 1
<p>Examples of video compression using 3D DCT, comparing the original video frame and various compression techniques, with variable-temporal-length 3D DCT implementations in the bottom row, clearly showing its superiority.</p>
Full article ">Figure 2
<p>Example of adaptive 3D DCT in a video frame, showcasing both fragmenting and replacing entire fragments with their average tones.</p>
Full article ">Figure 3
<p>Example of adaptive discrete cosine transform: (<b>a</b>) original image; (<b>b</b>) quadtree grid on the image, yellow color marks the solid color fragments; (<b>c</b>) adaptive discrete cosine transform spectrum of the image (brightened up for clarity).</p>
Full article ">Figure 4
<p>Examples of image histograms: (<b>a</b>) histogram of the original image; (<b>b</b>) histogram after increasing the brightness by 40%; (<b>c</b>) histogram after normalization, i.e., extending it to the entire 0–255 spectrum width; (<b>d</b>) histogram after multiplying the brightness by 2, resulting in a “whiteout” (overexposure).</p>
Full article ">Figure 5
<p>Example of solid color fragments: (<b>a</b>) processed image; (<b>b</b>) processed image: solid color fragments are marked with yellow; (<b>c</b>) close-ups of the sky and the road surface demonstrate little impact on the overall perception unless zoomed in.</p>
Full article ">Figure 6
<p>Image quality comparison: (<b>a</b>) the original images; (<b>b</b>) high-quality ADCT; (<b>c</b>) medium-quality ADCT; (<b>d</b>) low-quality ADCT.</p>
Full article ">Figure 7
<p>Flowchart of the optimization process; the grid is the search area; each grid cell is a pair of threshold values.</p>
Full article ">Figure 8
<p>Optimal threshold values for different original ITDV values of the images.</p>
Full article ">Figure 9
<p>Examples of the test dataset frames.</p>
Full article ">Figure 10
<p>Distribution of ITDV values through all images of the considered dataset.</p>
Full article ">Figure 11
<p>Comparison of resulting MS-SSIM and compression ratio after ADCT and non-adaptive DCT for ITDV values from 9 to 16.</p>
Full article ">Figure 12
<p>Graphical comparison of proposed ADCT with non-adaptive DCT in terms of approximated bit rate vs. MS-SSIM.</p>
Full article ">Figure 13
<p>Comparison [<a href="#B25-algorithms-17-00366" class="html-bibr">25</a>] with the existing algorithms.</p>
Full article ">Figure 14
<p>Advantage in low target quality compression of ADCT over DCT and preservation of information about vehicles in the scene: (<b>a</b>) original image; (<b>b</b>) fragmentation quadtree of ADCT and grid of DCT (yellow fragments are solid color); (<b>c</b>) resulting images (details about car positions are preserved).</p>
Full article ">Figure A1
<p>Highway footage frame test: (<b>a</b>) The original image. (<b>b</b>) High-quality ADCT and DCT. (<b>c</b>) Medium-quality ADCT and DCT. (<b>d</b>) Low-quality ADCT and DCT.</p>
Full article ">Figure A2
<p>Recorder footage frame test: (<b>a</b>) The original image. (<b>b</b>) High-quality ADCT and DCT. (<b>c</b>) Medium-quality ADCT and DCT. (<b>d</b>) Low-quality ADCT and DCT.</p>
Full article ">Figure A3
<p>Dark highway photo test: (<b>a</b>) The original image. (<b>b</b>) High-quality ADCT and DCT. (<b>c</b>) Medium-quality ADCT and DCT. (<b>d</b>) Low-quality ADCT and DCT.</p>
Full article ">Figure A4
<p>Sunny highway photo test: (<b>a</b>) The original image. (<b>b</b>) High-quality ADCT and DCT. (<b>c</b>) Medium-quality ADCT and DCT. (<b>d</b>) Low-quality ADCT and DCT.</p>
Full article ">
16 pages, 10945 KiB  
Article
Impact of Video Motion Content on HEVC Coding Efficiency
by Khalid A. M. Salih, Ismail Amin Ali and Ramadhan J. Mstafa
Computers 2024, 13(8), 204; https://doi.org/10.3390/computers13080204 - 18 Aug 2024
Viewed by 968
Abstract
Digital video coding aims to reduce the bitrate and keep the integrity of visual presentation. High-Efficiency Video Coding (HEVC) can effectively compress video content to be suitable for delivery over various networks and platforms. Finding the optimal coding configuration is challenging as the [...] Read more.
Digital video coding aims to reduce the bitrate and keep the integrity of visual presentation. High-Efficiency Video Coding (HEVC) can effectively compress video content to be suitable for delivery over various networks and platforms. Finding the optimal coding configuration is challenging as the compression performance highly depends on the complexity of the encoded video sequence. This paper evaluates the effects of motion content on coding performance and suggests an adaptive encoding scheme based on the motion content of encoded video. To evaluate the effects of motion content on the compression performance of HEVC, we tested three coding configurations with different Group of Pictures (GOP) structures and intra refresh mechanisms. Namely, open GOP IPPP, open GOP Periodic-I, and closed GOP periodic-IDR coding structures were tested using several test sequences with a range of resolutions and motion activity. All sequences were first tested to check their motion activity. The rate–distortion curves were produced for all the test sequences and coding configurations. Our results show that the performance of IPPP coding configuration is significantly better (up to 4 dB) than periodic-I and periodic-IDR configurations for sequences with low motion activity. For test sequences with intermediate motion activity, IPPP configuration can still achieve a reasonable quality improvement over periodic-I and periodic-IDR configurations. However, for test sequences with high motion activity, IPPP configuration has a very small performance advantage over periodic-I and periodic-IDR configurations. Our results indicate the importance of selecting the appropriate coding structure according to the motion activity of the video being encoded. Full article
Show Figures

Figure 1

Figure 1
<p>HEVC video coding encoder [<a href="#B23-computers-13-00204" class="html-bibr">23</a>].</p>
Full article ">Figure 2
<p>HEVC coding units.</p>
Full article ">Figure 3
<p>HEVC coding blocks.</p>
Full article ">Figure 4
<p>Proposed framework.</p>
Full article ">Figure 5
<p>Snapshots of test sequences used [<a href="#B38-computers-13-00204" class="html-bibr">38</a>,<a href="#B39-computers-13-00204" class="html-bibr">39</a>].</p>
Full article ">Figure 6
<p>Configurations tested.</p>
Full article ">Figure 7
<p>Average MVpp of tested video sequences.</p>
Full article ">Figure 8
<p>Frame 90 of <span class="html-italic">Crowd_run</span>, showing motion vectors as white lines.</p>
Full article ">Figure 9
<p>Frame 90 of <span class="html-italic">HoneyBee</span>, showing motion vectors as short red lines.</p>
Full article ">Figure 10
<p>Rate–distortion curve for <span class="html-italic">HoneyBee</span> test sequence.</p>
Full article ">Figure 11
<p>Rate–distortion curve for <span class="html-italic">Sunflower</span> test sequence.</p>
Full article ">Figure 12
<p>Rate–distortion curve for <span class="html-italic">FourPeople</span> test sequence.</p>
Full article ">Figure 13
<p>Rate–distortion curve for <span class="html-italic">Mobcal</span> test sequence.</p>
Full article ">Figure 14
<p>Rate–distortion curve for <span class="html-italic">Shields</span> test sequence.</p>
Full article ">Figure 15
<p>Rate–distortion curve for <span class="html-italic">YachtRide</span> test sequence.</p>
Full article ">Figure 16
<p>Rate–distortion curve for <span class="html-italic">Ducks_take_off</span> test sequence.</p>
Full article ">Figure 17
<p>Rate–distortion curve for <span class="html-italic">Crowd_run</span> test sequence.</p>
Full article ">Figure 18
<p>Encoding time.</p>
Full article ">Figure 19
<p>Decoding time.</p>
Full article ">
12 pages, 1161 KiB  
Article
Investigating the Hepatitis E Virus (HEV) Diversity in Rat Reservoirs from Northern Italy
by Luca De Sabato, Marina Monini, Roberta Galuppi, Filippo Maria Dini, Giovanni Ianiro, Gabriele Vaccari, Fabio Ostanello and Ilaria Di Bartolo
Pathogens 2024, 13(8), 633; https://doi.org/10.3390/pathogens13080633 - 29 Jul 2024
Cited by 1 | Viewed by 1287
Abstract
Hepatitis E virus belonging to the Rocahepevirus ratti species, genotype HEV-C1, has been extensively reported in rats in Europe, Asia and North America. Recently, human cases of hepatitis associated with HEV-C1 infection have been reported, but the zoonotic nature of rat-HEV remains controversial. [...] Read more.
Hepatitis E virus belonging to the Rocahepevirus ratti species, genotype HEV-C1, has been extensively reported in rats in Europe, Asia and North America. Recently, human cases of hepatitis associated with HEV-C1 infection have been reported, but the zoonotic nature of rat-HEV remains controversial. The transmission route of rat-HEV is unidentified and requires further investigation. The HEV strains of the Paslahepevirus balayani species, belonging to the same Hepeviridae family, and including the zoonotic genotype HEV-3 usually found in pigs, have also sporadically been identified in rats. We sampled 115 rats (liver, lung, feces) between 2020 and 2023 in Northeast Italy and the HEV detection was carried out by using Reverse Transcription PCR. HEV RNA was detected in 3/115 (2.6%) rats who tested positive for HEV-C1 strains in paired lung, intestinal contents and liver samples. Overall, none tested positive for the Paslahepevirus balayani strains. In conclusion, our results confirm the presence of HEV-rat in Italy with a prevalence similar to previous studies but show that there is a wide heterogeneity of strains in circulation. The detection of HEV-C1 genotype of Rocahepevirus ratti species in some human cases of acute hepatitis suggests that HEV-C1 may be an underestimated source of human infections. This finding, with the geographically widespread detection of HEV-C1 in rats, raises questions about the role of rats as hosts for both HEV-C1 and HEV-3 and the possibility of zoonotic transmission. Full article
Show Figures

Figure 1

Figure 1
<p>Phylogenetic analysis based on the 270 nt fragment of the partial RdRp region within ORF1 of the 9 sequences obtained in this study (entries highlighted in bold and indicated by black dots), 124 HEV-C1 sequences obtained from NBCI database by BLASTn searches, and two HEV-C2 sequences used as an outgroup. The maximum likelihood tree was produced using the TIM2 model (Transition model 2) with invariant sites and gamma distribution based on 1000 bootstrap replications and bootstraps values &gt;70 indicated at their respective nodes. Sequence entries are reported as GenBank Accession Number, Country and Host species. On the right side, sequences belonging to G1–G3 group of HEV-C1 are indicated.</p>
Full article ">Figure 2
<p>Alignment of amino acid sequences and secondary structure elements of HEV capsid proteins. The first line indicates the PDB number (2ZTN) of the capsid protein from the HEV-3 strain along with its secondary structure elements. On the left, the names of strains analyzed in the capsid proteins are indicated with accession numbers, genotypes, and subtypes. The capsid protein of the strains sequenced in this study are indicated by the names RatHEV119IT21 and RatHEV115IT21. Spiral lines indicate helices, while arrows represent β strands. White characters in red boxes represent strictly conserved residues and red characters represent stereochemically identical residues.</p>
Full article ">Figure 2 Cont.
<p>Alignment of amino acid sequences and secondary structure elements of HEV capsid proteins. The first line indicates the PDB number (2ZTN) of the capsid protein from the HEV-3 strain along with its secondary structure elements. On the left, the names of strains analyzed in the capsid proteins are indicated with accession numbers, genotypes, and subtypes. The capsid protein of the strains sequenced in this study are indicated by the names RatHEV119IT21 and RatHEV115IT21. Spiral lines indicate helices, while arrows represent β strands. White characters in red boxes represent strictly conserved residues and red characters represent stereochemically identical residues.</p>
Full article ">
19 pages, 746 KiB  
Article
Fast Depth Map Coding Algorithm for 3D-HEVC Based on Gradient Boosting Machine
by Xiaoke Su, Yaqiong Liu and Qiuwen Zhang
Electronics 2024, 13(13), 2586; https://doi.org/10.3390/electronics13132586 - 1 Jul 2024
Viewed by 995
Abstract
Three-Dimensional High-Efficiency Video Coding (3D-HEVC) has been extensively researched due to its efficient compression and deep image representation, but encoding complexity continues to pose a difficulty. This is mainly attributed to redundancy in the coding unit (CU) recursive partitioning process and rate–distortion (RD) [...] Read more.
Three-Dimensional High-Efficiency Video Coding (3D-HEVC) has been extensively researched due to its efficient compression and deep image representation, but encoding complexity continues to pose a difficulty. This is mainly attributed to redundancy in the coding unit (CU) recursive partitioning process and rate–distortion (RD) cost calculation, resulting in a complex encoding process. Therefore, enhancing encoding efficiency and reducing redundant computations are key objectives for optimizing 3D-HEVC. This paper introduces a fast-encoding method for 3D-HEVC, comprising an adaptive CU partitioning algorithm and a rapid rate–distortion-optimization (RDO) algorithm. Based on the ALV features extracted from each coding unit, a Gradient Boosting Machine (GBM) model is constructed to obtain the corresponding CU thresholds. These thresholds are compared with the ALV to further decide whether to continue dividing the coding unit. The RDO algorithm is used to optimize the RD cost calculation process, selecting the optimal prediction mode as much as possible. The simulation results show that this method saves 52.49% of complexity while ensuring good video quality. Full article
Show Figures

Figure 1

Figure 1
<p>Example of quadtree division from CTU to CU.</p>
Full article ">Figure 2
<p>Predictive mode diagram of 3D-HEVC.</p>
Full article ">Figure 3
<p>Analysis of encoding unit complexity.</p>
Full article ">Figure 4
<p>Process of local variance calculation.</p>
Full article ">Figure 5
<p>Cumulative Distribution of CU sizes in depth maps for ALV.</p>
Full article ">Figure 6
<p>Depicting the method flow based on the GBM model.</p>
Full article ">Figure 7
<p>Threshold curves for CU sizes in 3D-HEVC and their modeling functions.</p>
Full article ">Figure 8
<p>Proportion of time consumption for bitrate prediction, depth distortion calculation, and SVD calculation in different sequences.</p>
Full article ">Figure 9
<p>Flowchart of fast RDO algorithm.</p>
Full article ">Figure 10
<p>Experimental results of RD trajectory.</p>
Full article ">
16 pages, 1739 KiB  
Article
Light-Field Image Compression Based on a Two-Dimensional Prediction Coding Structure
by Jianrui Shao, Enjian Bai, Xueqin Jiang and Yun Wu
Information 2024, 15(6), 339; https://doi.org/10.3390/info15060339 - 7 Jun 2024
Cited by 1 | Viewed by 951
Abstract
Light-field images (LFIs) are gaining increased attention within the field of 3D imaging, virtual reality, and digital refocusing, owing to their wealth of spatial and angular information. The escalating volume of LFI data poses challenges in terms of storage and transmission. To address [...] Read more.
Light-field images (LFIs) are gaining increased attention within the field of 3D imaging, virtual reality, and digital refocusing, owing to their wealth of spatial and angular information. The escalating volume of LFI data poses challenges in terms of storage and transmission. To address this problem, this paper introduces an MSHPE (most-similar hierarchical prediction encoding) structure based on light-field multi-view images. By systematically exploring the similarities among sub-views, our structure obtains residual views through the subtraction of the encoded view from its corresponding reference view. Regarding the encoding process, this paper implements a new encoding scheme to process all residual views, achieving lossless compression. High-efficiency video coding (HEVC) is applied to encode select residual views, thereby achieving lossy compression. Furthermore, the introduced structure is conceptualized as a layered coding scheme, enabling progressive transmission and showing good random access performance. Experimental results demonstrate the superior compression performance attained by encoding residual views according to the proposed structure, outperforming alternative structures. Notably, when HEVC is employed for encoding residual views, significant bit savings are observed compared to the direct encoding of original views. The final restored view presents better detail quality, reinforcing the effectiveness of this approach. Full article
Show Figures

Figure 1

Figure 1
<p>Two light-field representations: (<b>a</b>) MI, (<b>b</b>) SAIs.</p>
Full article ">Figure 2
<p>(<b>a</b>) PSNR value between center view and other views, (<b>b</b>) representation of prediction structure.</p>
Full article ">Figure 3
<p>Reference view and coding sequence.</p>
Full article ">Figure 4
<p>Two pixel histogram of residual plots: (<b>a</b>) histogram of medieval’s residual, (<b>b</b>) histogram of corner’s residual.</p>
Full article ">Figure 5
<p>Progressive transmission diagram.</p>
Full article ">Figure 6
<p>The presentation of the test images. (<b>a</b>) Cotton, (<b>b</b>) dino, (<b>c</b>) antinous, (<b>d</b>) medieval, (<b>e</b>) kitchen, (<b>f</b>) boardgames, (<b>g</b>) car, (<b>h</b>) room, and (<b>i</b>) corner.</p>
Full article ">Figure 7
<p>The presentation of the five different prediction structures and the coding sequence and hierarchical diagram of the proposed structures. (<b>a</b>) Spiral, (<b>b</b>) raster, (<b>c</b>) MVI, (<b>d</b>) 2-DHS, (<b>e</b>) 2-DKS, coding order and hierarchical diagram (the different colors represent the different layers, and the numbers represent the encoding order).</p>
Full article ">Figure 8
<p>Rate-of-distortion curves for different difference sequences were encoded using HEVC: (<b>a</b>) medieval, (<b>b</b>) kitchen, (<b>c</b>) boardgames.</p>
Full article ">Figure 9
<p>Comparison of the details of the original image and the reconstruction images after MSHPE and HEVC. (<b>a</b>) QP = 40, location is (4,5) (<b>b</b>) QP = 40, location is (1,1), (<b>c</b>) QP = 20, location is (4,5).</p>
Full article ">Figure 10
<p>Graphs of the number of bits and the average Y-PSNR as a function of QP for the five schemes: (<b>a</b>) curve of number of bits with QP, (<b>b</b>) curve of average Y-PSNR with QP.</p>
Full article ">
15 pages, 4633 KiB  
Article
Faster Intra-Prediction of Versatile Video Coding Using a Concatenate-Designed CNN via DCT Coefficients
by Sio-Kei Im and Ka-Hou Chan
Electronics 2024, 13(11), 2214; https://doi.org/10.3390/electronics13112214 - 6 Jun 2024
Viewed by 713
Abstract
As the next generation video coding standard, Versatile Video Coding (VVC) significantly improves coding efficiency over the current High-Efficiency Video Coding (HEVC) standard. In practice, this improvement comes at the cost of increased pre-processing complexity. This increased complexity faces the challenge of implementing [...] Read more.
As the next generation video coding standard, Versatile Video Coding (VVC) significantly improves coding efficiency over the current High-Efficiency Video Coding (HEVC) standard. In practice, this improvement comes at the cost of increased pre-processing complexity. This increased complexity faces the challenge of implementing VVC for time-consuming encoding. This work presents a technique to simplify VVC intra-prediction using Discrete Cosine Transform (DCT) feature analysis and a concatenate-designed CNN. The coefficients of the (DTC-)transformed CUs reflect the complexity of the original texture, and the proposed CNN employs multiple classifiers to predict whether they should be split. This approach can determine whether to split Coding Units (CUs) of different sizes according to the Versatile Video Coding (VVC) standard. This helps to simplify the intra-prediction process. The experimental results indicate that our approach can reduce the encoding time by 52.77% with a minimal increase of 1.48%. We use the Bjøntegaard Delta Bit Rate (BDBR) compared to the original algorithm, demonstrating a competitive result with other state-of-the-art methods in terms of coding efficiency with video quality. Full article
(This article belongs to the Special Issue Image and Video Processing Based on Deep Learning)
Show Figures

Figure 1

Figure 1
<p>The complexity of the CU structure is highly dependent on the <math display="inline"><semantics> <mi mathvariant="italic">QP</mi> </semantics></math> values that control the video quality, so more complex parts are split into smaller CUs.</p>
Full article ">Figure 2
<p>Distribution of various CU sizes with respect to different <math display="inline"><semantics> <mi mathvariant="italic">QP</mi> </semantics></math>s.</p>
Full article ">Figure 3
<p>The computation of encoding and decoding features. The original colour is projected into the frequency domain using DCT transformation. The zero, low, medium, and high frequencies are, respectively, filled by yellow, blue, green, and orange colours. The resulting DCT coefficients are then passed to quantisation to reduce the higher-frequency domain, resulting in a simplified feature for optimal CU determination.</p>
Full article ">Figure 4
<p>The main structure of the proposed model.</p>
Full article ">Figure 5
<p>An illustration of feature extraction by VGGreNet, and the process of collecting these features using the concatenate-designed CNN with multiple classifiers.</p>
Full article ">
25 pages, 940 KiB  
Article
Fast Versatile Video Coding (VVC) Intra Coding for Power-Constrained Applications
by Lei Chen, Baoping Cheng, Haotian Zhu, Haowen Qin, Lihua Deng and Lei Luo
Electronics 2024, 13(11), 2150; https://doi.org/10.3390/electronics13112150 - 31 May 2024
Cited by 2 | Viewed by 978
Abstract
Versatile Video Coding (VVC) achieves impressive coding gain improvement (about 40%+) over the preceding High-Efficiency Video Coding (HEVC) technology at the cost of extremely high computational complexity. Such an extremely high complexity increase is a great challenge for power-constrained applications, such as Internet [...] Read more.
Versatile Video Coding (VVC) achieves impressive coding gain improvement (about 40%+) over the preceding High-Efficiency Video Coding (HEVC) technology at the cost of extremely high computational complexity. Such an extremely high complexity increase is a great challenge for power-constrained applications, such as Internet of video things. In the case of intra coding, VVC utilizes the brute-force recursive search for both the partition structure of the coding unit (CU), which is based on the quadtree with nested multi-type tree (QTMT), and 67 intra prediction modes, compared to 35 in HEVC. As a result, we offer optimization strategies for CU partition decision and intra coding modes to lessen the computational overhead. Regarding the high complexity of the CU partition process, first, CUs are categorized as simple, fuzzy, and complex based on their texture characteristics. Then, we train two random forest classifiers to speed up the RDO-based brute-force recursive search process. One of the classifiers directly predicts the optimal partition modes for simple and complex CUs, while another classifier determines the early termination of the partition process for fuzzy CUs. Meanwhile, to reduce the complexity of intra mode prediction, a fast hierarchical intra mode search method is designed based on the texture features of CUs, including texture complexity, texture direction, and texture context information. Extensive experimental findings demonstrate that the proposed approach reduces complexity by up to 77% compared to the latest VVC reference software (VTM-23.1). Additionally, an average coding time saving of 70% is achieved with only a 1.65% increase in BDBR. Furthermore, when compared to state-of-the-art methods, the proposed method also achieves the largest time saving with comparable BDBR loss. These findings indicate that our method is superior to other up-to-date methods in terms of lowering VVC intra coding complexity, which provides an elective solution for power-constrained applications. Full article
Show Figures

Figure 1

Figure 1
<p>An illustration of the QTMT partition structure [<a href="#B1-electronics-13-02150" class="html-bibr">1</a>].</p>
Full article ">Figure 2
<p>Flowchart of the three-step intra mode decision in VVC reference software.</p>
Full article ">Figure 3
<p>An example of quadtree with nested multi-type tree coding block structure.</p>
Full article ">Figure 4
<p>Neighboring CUs.</p>
Full article ">Figure 5
<p>Relation between texture direction and gradient.</p>
Full article ">Figure 6
<p>Flowchart of the proposed fast CU partition decision based on random forest classifier.</p>
Full article ">Figure 7
<p>Different CUs with same variance.</p>
Full article ">Figure 8
<p>Illustration of the random forest <math display="inline"><semantics> <mrow> <mi>R</mi> <msub> <mi>F</mi> <mrow> <mi>P</mi> <mi>M</mi> </mrow> </msub> </mrow> </semantics></math> or <math display="inline"><semantics> <mrow> <mi>R</mi> <msub> <mi>F</mi> <mrow> <mi>E</mi> <mi>T</mi> </mrow> </msub> </mrow> </semantics></math>.</p>
Full article ">Figure 9
<p>Flowchart of the hierarchical search method for fast intra mode prediction.</p>
Full article ">Figure 10
<p>The accuracy of the two random forest classifiers. (<b>a</b>) <math display="inline"><semantics> <mrow> <mi>R</mi> <msub> <mi>F</mi> <mrow> <mi>P</mi> <mi>M</mi> </mrow> </msub> </mrow> </semantics></math>; (<b>b</b>) <math display="inline"><semantics> <mrow> <mi>R</mi> <msub> <mi>F</mi> <mrow> <mi>E</mi> <mi>T</mi> </mrow> </msub> </mrow> </semantics></math>.</p>
Full article ">Figure 11
<p>Influence of different components in FCPD for different sequences. (<b>a</b>) <span class="html-italic">BasketballDrive</span>; (<b>b</b>) <span class="html-italic">RaceHorses</span>; (<b>c</b>) <span class="html-italic">BasketballPass</span>; (<b>d</b>) <span class="html-italic">Johnny</span>.</p>
Full article ">Figure 12
<p>The accuracy of FIMP.</p>
Full article ">Figure 13
<p>Performance results of the proposed FCPD, FIMD, and overall algorithm compared with VTM-7.0 encoder of RaceHorseC. (<b>a</b>) RD curves; (<b>b</b>) time saving under different QPs.</p>
Full article ">
17 pages, 512 KiB  
Article
Fast Coding Unit Partitioning Algorithm for Video Coding Standard Based on Block Segmentation and Block Connection Structure and CNN
by Nana Li, Zhenyi Wang and Qiuwen Zhang
Electronics 2024, 13(9), 1767; https://doi.org/10.3390/electronics13091767 - 2 May 2024
Viewed by 1126
Abstract
The recently introduced Video Coding Standard, VVC, presents a novel Quadtree plus Nested Multi-Type Tree (QTMTT) block structure. This structure enables a more flexible block partition and demonstrates enhanced compression performance compared to its predecessor, HEVC. However, The introduction of the new structure [...] Read more.
The recently introduced Video Coding Standard, VVC, presents a novel Quadtree plus Nested Multi-Type Tree (QTMTT) block structure. This structure enables a more flexible block partition and demonstrates enhanced compression performance compared to its predecessor, HEVC. However, The introduction of the new structure has led to a more complex partition search process, resulting in a considerable increase in time complexity. The QTMTT structure yields diverse Coding Unit (CU) block sizes, posing challenges for CNN model inference. In this study, we propose a representation structure termed Block Segmentation and Block Connection (BSC), rooted in texture features. This ensures that partial CU blocks are uniformly represented in size. To address different-sized CUs, various levels of CNN models are designed for prediction. Moreover, we introduce a post-processing method and a multi-thresholding scheme to further mitigate errors introduced by CNNs. This allows for flexible and adjustable acceleration, achieving a trade-off between coding time complexity and performance. Experimental results indicate that, in comparison to VTM-10.0, our “Fast” scheme reduces the average complexity by 57.14% with a 1.86% increase in BDBR. Meanwhile, the “Moderate” scheme reduces average complexity by 50.14% with only a 1.39% increase in BDBR. Full article
(This article belongs to the Special Issue Recent Advances in Image/Video Compression and Coding)
Show Figures

Figure 1

Figure 1
<p>QTMTT structure in VVC.</p>
Full article ">Figure 2
<p>Flowchart of the overall algorithm (B = CU size represents one of the CU sizes: 64 × 64, 32 × 32, 16 × 16, 32 × 16, 16 × 32, 32 × 8, 8 × 32).</p>
Full article ">Figure 3
<p>Block segmentation and block connection structure data statistics.</p>
Full article ">Figure 4
<p>Block segmentation and block connection structure.</p>
Full article ">Figure 5
<p>The structure of CNN models.</p>
Full article ">Figure 6
<p>Distribution of datasets (32 × 32 (<b>left</b>), 64 × 64 (<b>center</b>), 16 × 16 (<b>right</b>)).</p>
Full article ">Figure 7
<p>Prediction accuracy of different threshold models.</p>
Full article ">Figure 8
<p>The compression performance of the proposed method under different configurations of L1–L3.</p>
Full article ">Figure 9
<p>Comparison with other methods. The horizontal axis is the BD–BR increase and the vertical axis is the encoding time acceleration factor.</p>
Full article ">
Back to TopTop