More Web Proxy on the site http://driver.im/

research-article

Accelerating Transform Algorithm Implementation for Efficient Intra Coding of 8K UHD Videos

Authors:

Ge LiAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 18, Issue 4

Article No.: 113, Pages 1 - 20

https://doi.org/10.1145/3507970

Published: 04 March 2022 Publication History

Abstract

Real-time ultra-high-definition (UHD) video applications have attracted much attention, where the encoder side urgently demands the high-throughput two-dimensional (2D) transform hardware implementation for the latest video coding standards. This article proposes an effective acceleration method for transform algorithm in UHD intra coding based on the third generation of audio video coding standard (AVS3). First, by conducting detailed statistical analysis, we devise an efficient hardware-friendly transform algorithm that can reduce running cycles and resource consumption remarkably. Second, to implement multiplierless computation for saving resources and power, a series of shift-and-add unit (SAU) hardwares are investigated to have much less adoptions of shifters and adders than the existing methods. Third, different types of hardware acceleration methods, including calculation pipelining, logical-loop unrolling, and module-level parallelism, are designed to efficaciously support the data-intensive high frame-rate 8K UHD video coding. Finally, due to the scarcity of 8K video sources, we also provide a new dataset for the performance verification. Experimental results demonstrate that our proposed method can effectively fulfill the real-time 8K intra encoding at beyond 60 fps, with very negligible loss on rate-distortion (R-D) performance, which is averagely 0.98% Bjontegaard-Delta Bit-Rate (BD-BR).

References

[1]

2021. uavs3e. Retrieved from https://github.com/uavs3/uavs3e.

[2]

Maha Abdallah, Carsten Griwodz, Kuan-Ta Chen, Gwendal Simon, Pin-Chun Wang, and Cheng-Hsin Hsu. 2018. Delay-sensitive video computing in the cloud: A survey. ACM Trans. Multimedia Comput. Commun. Appl. 14, 3s (June 2018). DOI:

Digital Library

[3]

Nasir Ahmed, T. Natarajan, and Kamisetty R. Rao. 1974. Discrete cosine transform. IEEE Trans. Comput. 100, 1 (1974), 90–93.

Digital Library

[4]

Sachille Atapattu, Namitha Liyanage, Nisal Menuka, Ishantha Perera, and Ajith Pasqual. 2016. Real time all intra HEVC HD encoder on FPGA. In IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP). 191–195. DOI:

[5]

Gisle Bjontegaard. 2001. Calculation of average PSNR differences between RD-curves. VCEG-M33 (2001). https://www.itu.int/wftp3/av-arch/video-site/0104_Aus/VCEG-M33.doc.

[6]

Benjamin Bross, Jianle Chen, Jens-Rainer Ohm, Gary J. Sullivan, and Ye-Kui Wang. 2021. Developments in international video coding standardization after AVC, with an overview of Versatile Video Coding (VVC). Proc. IEEE 109, 9 (2021), 1463–1493. DOI:

[7]

Zhanyuan Cai and Wei Gao. 2021. Efficient fast algorithm and parallel hardware architecture for intra prediction of AVS3. In IEEE International Symposium on Circuits and Systems (ISCAS). 1–5. DOI:

[8]

Subiman Chatterjee and Kishor Sarawadekar. 2018. An optimized architecture of HEVC core transform using real-valued DCT coefficients. IEEE Trans. Circ. Syst. II: Express Briefs 65, 12 (2018), 2052–2056. DOI:

[9]

Zong-Yi Chen, Hui-Yu Jiang, and Pao-Chi Chang. 2017. Efficient intra transform unit partitioning for high efficiency video coding. In IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW). 215–216. DOI:

[10]

A. D. Darji and Raviraj P. Makwana. 2015. High-performance multiplierless DCT architecture for HEVC. In 19th International Symposium on VLSI Design and Test. 1–5. DOI:

[11]

Xinchao Dong, Liquan Shen, Mei Yu, and Hao Yang. 2021. Fast intra mode decision algorithm for versatile video coding. IEEE Trans. Multimedia 24 (2021), 400–414. DOI:

Digital Library

[12]

Tanima Dutta and Hari Prabhat Gupta. 2017. An efficient framework for compressed domain watermarking in P frames of High-Efficiency Video Coding (HEVC)–encoded video. ACM Trans. Multimedia Comput. Commun. Appl. 13, 1 (Jan. 2017). DOI:

Digital Library

[13]

Chih-Peng Fan, Chia-Wei Chang, and Shun-Ji Hsu. 2014. Cost-effective hardware-sharing design of fast algorithm based multiple forward and inverse transforms for H.264/AVC, MPEG-1/2/4, AVS, and VC-1 video encoding and decoding applications. IEEE Trans. Circ. Syst. Vid. Technol. 24, 4 (2014), 714–720. DOI:

[14]

Chih-Peng Fan, Chia-Hao Fang, Chia-Wei Chang, and Shun-Ji Hsu. 2011. Fast multiple inverse transforms with low-cost hardware sharing design for multistandard video decoding. IEEE Trans. Circ. Syst. II: Express Briefs 58, 8 (2011), 517–521. DOI:

[15]

Kui Fan, Yangang Cai, Xuesong Gao, Weiqiang Chen, Shengyuan Wu, Zhenyu Wang, Ronggang Wang, and Wen Gao. 2020. Performance and computational complexity analysis of coding tools in AVS3. In IEEE International Conference on Multimedia Expo Workshops (ICMEW). 1–6. DOI:

[16]

Yibo Fan, Yixuan Zeng, Heming Sun, Jiro Katto, and Xiaoyang Zeng. 2020. A pipelined 2D transform architecture supporting mixed block sizes for the VVC standard. IEEE Trans. Circ. Syst. Vid. Technol. 30, 9 (2020), 3289–3295. DOI:

Digital Library

[17]

Wei Gao, Sam Kwong, and Yuheng Jia. 2017. Joint machine learning and game theory for rate control in high efficiency video coding. IEEE Trans. Image Process. 26, 12 (2017), 6074–6089. DOI:

Digital Library

[18]

Wei Gao, Sam Kwong, Hui Yuan, and Xu Wang. 2016. DCT coefficient distribution modeling and quality dependency analysis based frame-level bit allocation for HEVC. IEEE Trans. Circ. Syst. Vid. Technol. 26, 1 (2016), 139–153. DOI:

Digital Library

[19]

Wei Gao, Sam Kwong, Yu Zhou, and Hui Yuan. 2016. SSIM-based game theory approach for rate-distortion optimized intra frame CTU-Level bit allocation. IEEE Trans. Multimedia 18, 6 (2016), 988–999. DOI:

Digital Library

[20]

A. Gupta and K. Raghava Rao. 1990. A fast recursive algorithm for the discrete sine transform. IEEE Trans. Acoust, Speech Sig. Process. 38, 3 (1990), 553–557.

[21]

Werda Imen, Belghith Fatma, Maraoui Amna, and Nouri Masmoudi. 2021. DCT -II transform hardware-based acceleration for VVC standard. In IEEE International Conference on Design Test of Integrated Micro Nano-Systems (DTS). 1–5. DOI:

[22]

Yuri V. Ivanov and C. J. Bleakley. 2010. Real-time H.264 video encoding in software with fast mode decision and dynamic complexity control. ACM Trans. Multimedia Comput. Commun. Appl. 6, 1 (Feb. 2010). DOI:

Digital Library

[23]

Maher Jridi and Pramod Kumar Meher. 2017. Scalable approximate DCT architectures for efficient HEVC-compliant video coding. IEEE Trans. Circ. Syst. Vid. Technol. 27, 8 (2017), 1815–1825. DOI:

Digital Library

[24]

Samruddhi Kahu, Madhu Peringassery Krishnan, Xin Zhao, and Shan Liu. 2021. Context-adaptive secondary transform for video coding. In IEEE International Conference on Image Processing (ICIP). 2039–2043. DOI:

[25]

Ahmed Kammoun, Wassim Hamidouche, Pierrick Philipp, Fatma Belghith, Nouri Massmoudi, and Jean-Frans Nezan. 2019. Hardware acceleration of approximate transform module for the versatile video coding standard. In 27th European Signal Processing Conference (EUSIPCO). 1–5. DOI:

[26]

Ahmed Kammoun, Wassim Hamidouche, Pierrick Philippe, Olivier Drges, Fatma Belghith, Nouri Masmoudi, and Jean-Frans Nezan. 2020. Forward-inverse 2D hardware implementation of approximate transform core for the VVC standard. IEEE Trans. Circ. Syst. Vid. Technol. 30, 11 (2020), 4340–4354. DOI:

[27]

Lingchao Kong and Rui Dai. 2018. Efficient video encoding for automatic video analysis in distributed wireless surveillance systems. ACM Trans. Multimedia Comput. Commun. Appl. 14, 3 (July 2018). DOI:

Digital Library

[28]

Krisda Lengwehasatit and Antonio Ortega. 2004. Scalable variable complexity approximate forward DCT. IEEE Trans. Circ. Syst. Vid. Technol. 14, 11 (2004), 1236–1248.

Digital Library

[29]

Lingyu Li, Xiaoyun Zhang, and Zhiyong Gao. 2015. Efficient SIMD acceleration of DCT and IDCT for high efficiency video coding. In 4th International Conference on Multimedia Technology. CRC Press.

[30]

Yao Liu, Mengbai Xiao, Ming Zhang, Xin Li, Mian Dong, Zhan Ma, Zhenhua Li, Lei Guo, and Songqing Chen. 2016. Content-adaptive display power saving for internet video applications on mobile devices. ACM Trans. Multimedia Comput. Commun. Appl. 12, 5s (Nov. 2016). DOI:

Digital Library

[31]

Siwei Ma, Tiejun Huang, Cliff Reader, and Wen Gao. 2015. AVS2? Making video coding smarter [standards in a nutshell]. IEEE Sig. Process. Mag. 32, 2 (2015), 172–183.

[32]

Maurizio Masera, Lorenzo Re Fiorentin, Maurizio Martina, Guido Masera, and Enrico Masala. 2015. Optimizing the transform complexity-quality tradeoff for hardware-accelerated HEVC video coding. In Conference on Design and Architectures for Signal and Image Processing (DASIP). 1–6. DOI:

[33]

Pramod Kumar Meher, Sang Yoon Park, Basant Kumar Mohanty, Khoon Seong Lim, and Chuohao Yeo. 2014. Efficient integer DCT architectures for HEVC. IEEE Trans. Circ. Syst. Vid. Technol. 24, 1 (2014), 168–178. DOI:

Digital Library

[34]

Neri Merhav and Vasudev Bhaskaran. 1997. Fast algorithms for DCT-domain image downsampling and for inverse motion compensation. IEEE Trans. Circ. Syst. Vid. Technol. 7, 3 (1997), 468–476.

Digital Library

[35]

Alan V. Oppenheim. 1999. Discrete-time Signal Processing. Pearson Education India.

Digital Library

[36]

Zhaoqing Pan, Jianjun Lei, Yajuan Zhang, and Fu Lee Wang. 2018. Adaptive fractional-pixel motion estimation skipped algorithm for efficient HEVC motion estimation. ACM Trans. Multimedia Comput. Commun. Appl. 14, 1 (Jan. 2018). DOI:

Digital Library

[37]

Zhaoqing Pan, Xiaokai Yi, Yun Zhang, Hui Yuan, Fu Lee Wang, and Sam Kwong. 2020. Frame-level bit allocation optimization based on<!–?Brk?–> video content characteristics for HEVC. ACM Trans. Multimedia Comput. Commun. Appl. 16, 1 (March 2020). DOI:

Digital Library

[38]

Nikuni Panchani and Ketki Pathak. 2018. Fast and multiplierless integer DCT for HEVC. In 3rd IEEE International Conference on Recent Trends in Electronics, Information Communication Technology (RTEICT). 724–727. DOI:

[39]

I-Ming Pao and Ming-Ting Sun. 1999. Modeling DCT coefficients for fast video encoding. IEEE Trans. Circ. Syst. Vid. Technol. 9, 4 (1999), 608–616.

Digital Library

[40]

Jongsun Park, Jung Hwan Choi, and Kaushik Roy. 2009. Dynamic bit-width adaptation in DCT: An approach to trade off image quality and computation energy. IEEE Trans. Very Large Scale Integ. Syst. 18, 5 (2009), 787–793.

Digital Library

[41]

Sharp. 2021. 8C-B60A 8K Professional Camcorder. Retrieved from https://global.sharp/corporate/news/171107_2.html.

[42]

Liquan Shen, Ping An, and Guorui Feng. 2019. Low-complexity scalable extension of the high-efficiency video coding (SHVC) encoding system. ACM Trans. Multimedia Comput. Commun. Appl. 15, 2 (June 2019). DOI:

Digital Library

[43]

Guo-An Su and Chih-Peng Fan. 2008. Low-cost hardware-sharing architecture of fast 1-D inverse transforms for H.264/AVC and AVS applications. IEEE Trans. Circ. Syst. II: Express Briefs 55, 12 (2008), 1249–1253. DOI:

[44]

Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circ. Syst. Vid. Technol. 22, 12 (2012), 1649–1668.

Digital Library

[45]

Vivienne Sze, Madhukar Budagavi, and Gary J. Sullivan. 2014. High Efficiency Video Coding (HEVC): Algorithms and Architectures. Springer Publishing Company, Incorporated.

[46]

Audio Video Coding Standard Workgroup. 2019. AVS Proposal M4772: Implicit selection of transforms for intra coding. Retrieved from ftp://47.93.196.121/Public/avsdoc/1906_Chengdu/contrib/M4772.zip.

[47]

Audio Video Coding Standard Workgroup. 2021. AVS3-Part 2 (Video). Retrieved from http://avs.org.cn/AVS3_download/index.asp.

[48]

Audio Video Coding Standard Workgroup. 2021. Reference Software for AVS3: High Performance Model. Retrieved from ftp://47.93.196.121/Public/codec/video_code.

[49]

Shengyuan Wu, Zhenyu Wang, Yangang Cai, and Ronggang Wang. 2021. Fast mode decision algorithm for intra encoding of the 3rd generation audio video coding standard. In International Conference on Multimedia Modeling. 481–492.

Digital Library

[50]

Xilinx. 2021. UltraScale Architecture Configurable Logic Block User Guide (UG574). Retrieved from https://www.xilinx.com/support/documentation/user_guides/ug574-ultrascale-clb.pdf.

[51]

Xilinx. 2021. Ultrascale FPGA Product Selection Guide. Retrieved from https://www.xilinx.com/support/documentation/selection-guides/ultrascale-fpga-product-selection-guide.pdf.

[52]

Xilinx. 2021. Virtex Ultrascale FPGA. Retrieved from https://www.xilinx.com/products/silicon-devices/fpga/virtex-ultrascale.html.

[53]

Xilinx. 2021. Vivado Simulator. Retrieved from https://www.xilinx.com/products/design-tools/vivado/simulator.html.

[54]

Hao Yang, Liquan Shen, Xinchao Dong, Qing Ding, Ping An, and Gangyi Jiang. 2020. Low-complexity CTU partition structure decision and fast intra mode decision for versatile video coding. IEEE Trans. Circ. Syst. Vid. Technol. 30, 6 (2020), 1668–1682. DOI:

[55]

Jiaqi Zhang, Chuanmin Jia, Meng Lei, Shanshe Wang, Siwei Ma, and Wen Gao. 2019. Recent development of AVS video coding standard: AVS3. In Picture Coding Symposium (PCS). IEEE, 1–5.

[56]

Yun Zhang, Sam Kwong, Guangjun Zhang, Zhaoqing Pan, Hui Yuan, and Gangyi Jiang. 2015. Low complexity HEVC INTRA coding for high-quality mobile video communication. IEEE Trans. Industr. Inform. 11, 6 (2015), 1492–1504. DOI:

[57]

Mingliang Zhou, Yongfei Zhang, Bo Li, and Hai-Miao Hu. 2017. Complexity-based intra frame rate control by jointing inter-frame correlation for high efficiency video coding. J. Vis. Commun. Image Represent. 42, C (Jan. 2017), 46–64. DOI:

Digital Library

[58]

Mingliang Zhou, Yongfei Zhang, Bo Li, and Xupeng Lin. 2017. Complexity correlation-based CTU-level rate control with direction selection for HEVC. ACM Trans. Multimedia Comput. Commun. Appl. 13, 4 (Aug. 2017). DOI:

Digital Library

Cited By

Wang YFeng LCai FLi LWu RLi J(2024)TEC-CNN: Toward Efficient Compressing of Convolutional Neural Nets with Low-rank Tensor DecompositionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370264121:2(1-23)Online publication date: 5-Nov-2024
https://dl.acm.org/doi/10.1145/3702641
Liao GGao W(2024)Rethinking Feature Mining for Light Field Salient Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367696720:10(1-24)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3676967
Pan JLiu XBai YZhai DJiang JZhao D(2024)Illumination-Aware Low-Light Image Enhancement with Transformer and Auto-Knee CurveACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366465320:8(1-23)Online publication date: 29-Jun-2024
https://dl.acm.org/doi/10.1145/3664653
Show More Cited By

Index Terms

Accelerating Transform Algorithm Implementation for Efficient Intra Coding of 8K UHD Videos
1. Computing methodologies
  1. Computer graphics
    1. Image compression

Recommendations

An efficient hardware implementation of parallel EBCOT algorithm for JPEG 2000

With the augmentation in multimedia technology, demand for high-speed real-time image compression systems has also increased. JPEG 2000 still image compression standard is developed to accommodate such application requirements. Embedded block coding ...
Cross residual transform for lossless intra-coding for HEVC

A new lossless intra-coding method based on a cross residual transform is applied to the next generation video coding standard HEVC (High Efficiency Video Coding). HEVC includes a multi-directional spatial prediction method to reduce spatial redundancy ...
New CAVLC design for lossless intra coding
ICIP'09: Proceedings of the 16th IEEE international conference on Image processing

The context-based adaptive variable length coder (CAVLC) in H.264/AVC is not appropriate for lossless video coding because it was designed for lossy video coding. Since statistical characteristics of residual data in lossy and lossless coding are quite ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 18, Issue 4

November 2022

497 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3514185

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 March 2022

Accepted: 01 December 2021

Revised: 01 December 2021

Received: 01 July 2021

Published in TOMM Volume 18, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

Ministry of Science and Technology of China - Science and Technology Innovations 2030
Natural Science Foundation of China
Guangdong Basic and Applied Basic Research Foundation
Shenzhen Science and Technology Plan Basic Research Project
Shenzhen Fundamental Research Program

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
549
Total Downloads

Downloads (Last 12 months)90
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang YFeng LCai FLi LWu RLi J(2024)TEC-CNN: Toward Efficient Compressing of Convolutional Neural Nets with Low-rank Tensor DecompositionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370264121:2(1-23)Online publication date: 5-Nov-2024
https://dl.acm.org/doi/10.1145/3702641
Liao GGao W(2024)Rethinking Feature Mining for Light Field Salient Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367696720:10(1-24)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3676967
Pan JLiu XBai YZhai DJiang JZhao D(2024)Illumination-Aware Low-Light Image Enhancement with Transformer and Auto-Knee CurveACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366465320:8(1-23)Online publication date: 29-Jun-2024
https://dl.acm.org/doi/10.1145/3664653
Yuan HGao WMa SYan Y(2024)Divide-and-conquer-based RDO-free CU Partitioning for 8K Video CompressionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363470520:4(1-20)Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1145/3634705
Wang YLu TYao YZhang YXiong Z(2024)Learning to Hallucinate Face in the DarkIEEE Transactions on Multimedia10.1109/TMM.2023.329480826(2314-2326)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3294808
Gao WLi GGao WLi G(2024)Open-Source Projects for 3D Point CloudsDeep Learning for 3D Point Clouds10.1007/978-981-97-9570-3_9(255-272)Online publication date: 10-Oct-2024
https://doi.org/10.1007/978-981-97-9570-3_9
Gao WLi GGao WLi G(2024)Point Cloud-Language Multi-modal LearningDeep Learning for 3D Point Clouds10.1007/978-981-97-9570-3_8(227-254)Online publication date: 10-Oct-2024
https://doi.org/10.1007/978-981-97-9570-3_8
Gao WLi GGao WLi G(2024)Point Cloud Pre-trained Models and Large ModelsDeep Learning for 3D Point Clouds10.1007/978-981-97-9570-3_7(195-225)Online publication date: 10-Oct-2024
https://doi.org/10.1007/978-981-97-9570-3_7
Gao WLi GGao WLi G(2024)Deep-Learning-Based Point Cloud Analysis IIDeep Learning for 3D Point Clouds10.1007/978-981-97-9570-3_6(163-193)Online publication date: 10-Oct-2024
https://doi.org/10.1007/978-981-97-9570-3_6
Gao WLi GGao WLi G(2024)Deep-Learning-Based Point Cloud Analysis IDeep Learning for 3D Point Clouds10.1007/978-981-97-9570-3_5(131-162)Online publication date: 10-Oct-2024
https://doi.org/10.1007/978-981-97-9570-3_5
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents