[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Accelerating Transform Algorithm Implementation for Efficient Intra Coding of 8K UHD Videos

Published: 04 March 2022 Publication History

Abstract

Real-time ultra-high-definition (UHD) video applications have attracted much attention, where the encoder side urgently demands the high-throughput two-dimensional (2D) transform hardware implementation for the latest video coding standards. This article proposes an effective acceleration method for transform algorithm in UHD intra coding based on the third generation of audio video coding standard (AVS3). First, by conducting detailed statistical analysis, we devise an efficient hardware-friendly transform algorithm that can reduce running cycles and resource consumption remarkably. Second, to implement multiplierless computation for saving resources and power, a series of shift-and-add unit (SAU) hardwares are investigated to have much less adoptions of shifters and adders than the existing methods. Third, different types of hardware acceleration methods, including calculation pipelining, logical-loop unrolling, and module-level parallelism, are designed to efficaciously support the data-intensive high frame-rate 8K UHD video coding. Finally, due to the scarcity of 8K video sources, we also provide a new dataset for the performance verification. Experimental results demonstrate that our proposed method can effectively fulfill the real-time 8K intra encoding at beyond 60 fps, with very negligible loss on rate-distortion (R-D) performance, which is averagely 0.98% Bjontegaard-Delta Bit-Rate (BD-BR).

References

[1]
2021. uavs3e. Retrieved from https://github.com/uavs3/uavs3e.
[2]
Maha Abdallah, Carsten Griwodz, Kuan-Ta Chen, Gwendal Simon, Pin-Chun Wang, and Cheng-Hsin Hsu. 2018. Delay-sensitive video computing in the cloud: A survey. ACM Trans. Multimedia Comput. Commun. Appl. 14, 3s (June 2018). DOI:
[3]
Nasir Ahmed, T. Natarajan, and Kamisetty R. Rao. 1974. Discrete cosine transform. IEEE Trans. Comput. 100, 1 (1974), 90–93.
[4]
Sachille Atapattu, Namitha Liyanage, Nisal Menuka, Ishantha Perera, and Ajith Pasqual. 2016. Real time all intra HEVC HD encoder on FPGA. In IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP). 191–195. DOI:
[5]
Gisle Bjontegaard. 2001. Calculation of average PSNR differences between RD-curves. VCEG-M33 (2001). https://www.itu.int/wftp3/av-arch/video-site/0104_Aus/VCEG-M33.doc.
[6]
Benjamin Bross, Jianle Chen, Jens-Rainer Ohm, Gary J. Sullivan, and Ye-Kui Wang. 2021. Developments in international video coding standardization after AVC, with an overview of Versatile Video Coding (VVC). Proc. IEEE 109, 9 (2021), 1463–1493. DOI:
[7]
Zhanyuan Cai and Wei Gao. 2021. Efficient fast algorithm and parallel hardware architecture for intra prediction of AVS3. In IEEE International Symposium on Circuits and Systems (ISCAS). 1–5. DOI:
[8]
Subiman Chatterjee and Kishor Sarawadekar. 2018. An optimized architecture of HEVC core transform using real-valued DCT coefficients. IEEE Trans. Circ. Syst. II: Express Briefs 65, 12 (2018), 2052–2056. DOI:
[9]
Zong-Yi Chen, Hui-Yu Jiang, and Pao-Chi Chang. 2017. Efficient intra transform unit partitioning for high efficiency video coding. In IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW). 215–216. DOI:
[10]
A. D. Darji and Raviraj P. Makwana. 2015. High-performance multiplierless DCT architecture for HEVC. In 19th International Symposium on VLSI Design and Test. 1–5. DOI:
[11]
Xinchao Dong, Liquan Shen, Mei Yu, and Hao Yang. 2021. Fast intra mode decision algorithm for versatile video coding. IEEE Trans. Multimedia 24 (2021), 400–414. DOI:
[12]
Tanima Dutta and Hari Prabhat Gupta. 2017. An efficient framework for compressed domain watermarking in P frames of High-Efficiency Video Coding (HEVC)–encoded video. ACM Trans. Multimedia Comput. Commun. Appl. 13, 1 (Jan. 2017). DOI:
[13]
Chih-Peng Fan, Chia-Wei Chang, and Shun-Ji Hsu. 2014. Cost-effective hardware-sharing design of fast algorithm based multiple forward and inverse transforms for H.264/AVC, MPEG-1/2/4, AVS, and VC-1 video encoding and decoding applications. IEEE Trans. Circ. Syst. Vid. Technol. 24, 4 (2014), 714–720. DOI:
[14]
Chih-Peng Fan, Chia-Hao Fang, Chia-Wei Chang, and Shun-Ji Hsu. 2011. Fast multiple inverse transforms with low-cost hardware sharing design for multistandard video decoding. IEEE Trans. Circ. Syst. II: Express Briefs 58, 8 (2011), 517–521. DOI:
[15]
Kui Fan, Yangang Cai, Xuesong Gao, Weiqiang Chen, Shengyuan Wu, Zhenyu Wang, Ronggang Wang, and Wen Gao. 2020. Performance and computational complexity analysis of coding tools in AVS3. In IEEE International Conference on Multimedia Expo Workshops (ICMEW). 1–6. DOI:
[16]
Yibo Fan, Yixuan Zeng, Heming Sun, Jiro Katto, and Xiaoyang Zeng. 2020. A pipelined 2D transform architecture supporting mixed block sizes for the VVC standard. IEEE Trans. Circ. Syst. Vid. Technol. 30, 9 (2020), 3289–3295. DOI:
[17]
Wei Gao, Sam Kwong, and Yuheng Jia. 2017. Joint machine learning and game theory for rate control in high efficiency video coding. IEEE Trans. Image Process. 26, 12 (2017), 6074–6089. DOI:
[18]
Wei Gao, Sam Kwong, Hui Yuan, and Xu Wang. 2016. DCT coefficient distribution modeling and quality dependency analysis based frame-level bit allocation for HEVC. IEEE Trans. Circ. Syst. Vid. Technol. 26, 1 (2016), 139–153. DOI:
[19]
Wei Gao, Sam Kwong, Yu Zhou, and Hui Yuan. 2016. SSIM-based game theory approach for rate-distortion optimized intra frame CTU-Level bit allocation. IEEE Trans. Multimedia 18, 6 (2016), 988–999. DOI:
[20]
A. Gupta and K. Raghava Rao. 1990. A fast recursive algorithm for the discrete sine transform. IEEE Trans. Acoust, Speech Sig. Process. 38, 3 (1990), 553–557.
[21]
Werda Imen, Belghith Fatma, Maraoui Amna, and Nouri Masmoudi. 2021. DCT -II transform hardware-based acceleration for VVC standard. In IEEE International Conference on Design Test of Integrated Micro Nano-Systems (DTS). 1–5. DOI:
[22]
Yuri V. Ivanov and C. J. Bleakley. 2010. Real-time H.264 video encoding in software with fast mode decision and dynamic complexity control. ACM Trans. Multimedia Comput. Commun. Appl. 6, 1 (Feb. 2010). DOI:
[23]
Maher Jridi and Pramod Kumar Meher. 2017. Scalable approximate DCT architectures for efficient HEVC-compliant video coding. IEEE Trans. Circ. Syst. Vid. Technol. 27, 8 (2017), 1815–1825. DOI:
[24]
Samruddhi Kahu, Madhu Peringassery Krishnan, Xin Zhao, and Shan Liu. 2021. Context-adaptive secondary transform for video coding. In IEEE International Conference on Image Processing (ICIP). 2039–2043. DOI:
[25]
Ahmed Kammoun, Wassim Hamidouche, Pierrick Philipp, Fatma Belghith, Nouri Massmoudi, and Jean-Frans Nezan. 2019. Hardware acceleration of approximate transform module for the versatile video coding standard. In 27th European Signal Processing Conference (EUSIPCO). 1–5. DOI:
[26]
Ahmed Kammoun, Wassim Hamidouche, Pierrick Philippe, Olivier Drges, Fatma Belghith, Nouri Masmoudi, and Jean-Frans Nezan. 2020. Forward-inverse 2D hardware implementation of approximate transform core for the VVC standard. IEEE Trans. Circ. Syst. Vid. Technol. 30, 11 (2020), 4340–4354. DOI:
[27]
Lingchao Kong and Rui Dai. 2018. Efficient video encoding for automatic video analysis in distributed wireless surveillance systems. ACM Trans. Multimedia Comput. Commun. Appl. 14, 3 (July 2018). DOI:
[28]
Krisda Lengwehasatit and Antonio Ortega. 2004. Scalable variable complexity approximate forward DCT. IEEE Trans. Circ. Syst. Vid. Technol. 14, 11 (2004), 1236–1248.
[29]
Lingyu Li, Xiaoyun Zhang, and Zhiyong Gao. 2015. Efficient SIMD acceleration of DCT and IDCT for high efficiency video coding. In 4th International Conference on Multimedia Technology. CRC Press.
[30]
Yao Liu, Mengbai Xiao, Ming Zhang, Xin Li, Mian Dong, Zhan Ma, Zhenhua Li, Lei Guo, and Songqing Chen. 2016. Content-adaptive display power saving for internet video applications on mobile devices. ACM Trans. Multimedia Comput. Commun. Appl. 12, 5s (Nov. 2016). DOI:
[31]
Siwei Ma, Tiejun Huang, Cliff Reader, and Wen Gao. 2015. AVS2? Making video coding smarter [standards in a nutshell]. IEEE Sig. Process. Mag. 32, 2 (2015), 172–183.
[32]
Maurizio Masera, Lorenzo Re Fiorentin, Maurizio Martina, Guido Masera, and Enrico Masala. 2015. Optimizing the transform complexity-quality tradeoff for hardware-accelerated HEVC video coding. In Conference on Design and Architectures for Signal and Image Processing (DASIP). 1–6. DOI:
[33]
Pramod Kumar Meher, Sang Yoon Park, Basant Kumar Mohanty, Khoon Seong Lim, and Chuohao Yeo. 2014. Efficient integer DCT architectures for HEVC. IEEE Trans. Circ. Syst. Vid. Technol. 24, 1 (2014), 168–178. DOI:
[34]
Neri Merhav and Vasudev Bhaskaran. 1997. Fast algorithms for DCT-domain image downsampling and for inverse motion compensation. IEEE Trans. Circ. Syst. Vid. Technol. 7, 3 (1997), 468–476.
[35]
Alan V. Oppenheim. 1999. Discrete-time Signal Processing. Pearson Education India.
[36]
Zhaoqing Pan, Jianjun Lei, Yajuan Zhang, and Fu Lee Wang. 2018. Adaptive fractional-pixel motion estimation skipped algorithm for efficient HEVC motion estimation. ACM Trans. Multimedia Comput. Commun. Appl. 14, 1 (Jan. 2018). DOI:
[37]
Zhaoqing Pan, Xiaokai Yi, Yun Zhang, Hui Yuan, Fu Lee Wang, and Sam Kwong. 2020. Frame-level bit allocation optimization based on<!–?Brk?–> video content characteristics for HEVC. ACM Trans. Multimedia Comput. Commun. Appl. 16, 1 (March 2020). DOI:
[38]
Nikuni Panchani and Ketki Pathak. 2018. Fast and multiplierless integer DCT for HEVC. In 3rd IEEE International Conference on Recent Trends in Electronics, Information Communication Technology (RTEICT). 724–727. DOI:
[39]
I-Ming Pao and Ming-Ting Sun. 1999. Modeling DCT coefficients for fast video encoding. IEEE Trans. Circ. Syst. Vid. Technol. 9, 4 (1999), 608–616.
[40]
Jongsun Park, Jung Hwan Choi, and Kaushik Roy. 2009. Dynamic bit-width adaptation in DCT: An approach to trade off image quality and computation energy. IEEE Trans. Very Large Scale Integ. Syst. 18, 5 (2009), 787–793.
[41]
Sharp. 2021. 8C-B60A 8K Professional Camcorder. Retrieved from https://global.sharp/corporate/news/171107_2.html.
[42]
Liquan Shen, Ping An, and Guorui Feng. 2019. Low-complexity scalable extension of the high-efficiency video coding (SHVC) encoding system. ACM Trans. Multimedia Comput. Commun. Appl. 15, 2 (June 2019). DOI:
[43]
Guo-An Su and Chih-Peng Fan. 2008. Low-cost hardware-sharing architecture of fast 1-D inverse transforms for H.264/AVC and AVS applications. IEEE Trans. Circ. Syst. II: Express Briefs 55, 12 (2008), 1249–1253. DOI:
[44]
Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circ. Syst. Vid. Technol. 22, 12 (2012), 1649–1668.
[45]
Vivienne Sze, Madhukar Budagavi, and Gary J. Sullivan. 2014. High Efficiency Video Coding (HEVC): Algorithms and Architectures. Springer Publishing Company, Incorporated.
[46]
Audio Video Coding Standard Workgroup. 2019. AVS Proposal M4772: Implicit selection of transforms for intra coding. Retrieved from ftp://47.93.196.121/Public/avsdoc/1906_Chengdu/contrib/M4772.zip.
[47]
Audio Video Coding Standard Workgroup. 2021. AVS3-Part 2 (Video). Retrieved from http://avs.org.cn/AVS3_download/index.asp.
[48]
Audio Video Coding Standard Workgroup. 2021. Reference Software for AVS3: High Performance Model. Retrieved from ftp://47.93.196.121/Public/codec/video_code.
[49]
Shengyuan Wu, Zhenyu Wang, Yangang Cai, and Ronggang Wang. 2021. Fast mode decision algorithm for intra encoding of the 3rd generation audio video coding standard. In International Conference on Multimedia Modeling. 481–492.
[50]
Xilinx. 2021. UltraScale Architecture Configurable Logic Block User Guide (UG574). Retrieved from https://www.xilinx.com/support/documentation/user_guides/ug574-ultrascale-clb.pdf.
[54]
Hao Yang, Liquan Shen, Xinchao Dong, Qing Ding, Ping An, and Gangyi Jiang. 2020. Low-complexity CTU partition structure decision and fast intra mode decision for versatile video coding. IEEE Trans. Circ. Syst. Vid. Technol. 30, 6 (2020), 1668–1682. DOI:
[55]
Jiaqi Zhang, Chuanmin Jia, Meng Lei, Shanshe Wang, Siwei Ma, and Wen Gao. 2019. Recent development of AVS video coding standard: AVS3. In Picture Coding Symposium (PCS). IEEE, 1–5.
[56]
Yun Zhang, Sam Kwong, Guangjun Zhang, Zhaoqing Pan, Hui Yuan, and Gangyi Jiang. 2015. Low complexity HEVC INTRA coding for high-quality mobile video communication. IEEE Trans. Industr. Inform. 11, 6 (2015), 1492–1504. DOI:
[57]
Mingliang Zhou, Yongfei Zhang, Bo Li, and Hai-Miao Hu. 2017. Complexity-based intra frame rate control by jointing inter-frame correlation for high efficiency video coding. J. Vis. Commun. Image Represent. 42, C (Jan. 2017), 46–64. DOI:
[58]
Mingliang Zhou, Yongfei Zhang, Bo Li, and Xupeng Lin. 2017. Complexity correlation-based CTU-level rate control with direction selection for HEVC. ACM Trans. Multimedia Comput. Commun. Appl. 13, 4 (Aug. 2017). DOI:

Cited By

View all
  • (2024)Rethinking Feature Mining for Light Field Salient Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367696720:10(1-24)Online publication date: 8-Jul-2024
  • (2024)Illumination-Aware Low-Light Image Enhancement with Transformer and Auto-Knee CurveACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366465320:8(1-23)Online publication date: 29-Jun-2024
  • (2024)Divide-and-conquer-based RDO-free CU Partitioning for 8K Video CompressionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363470520:4(1-20)Online publication date: 11-Jan-2024
  • Show More Cited By

Index Terms

  1. Accelerating Transform Algorithm Implementation for Efficient Intra Coding of 8K UHD Videos

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 4
    November 2022
    497 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3514185
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 March 2022
    Accepted: 01 December 2021
    Revised: 01 December 2021
    Received: 01 July 2021
    Published in TOMM Volume 18, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Image and video coding
    2. transform hardware architecture
    3. FPGA implementation
    4. ultra-high-definition (UHD) video
    5. 8K dataset

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • Ministry of Science and Technology of China - Science and Technology Innovations 2030
    • Natural Science Foundation of China
    • Guangdong Basic and Applied Basic Research Foundation
    • Shenzhen Science and Technology Plan Basic Research Project
    • Shenzhen Fundamental Research Program

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)107
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 12 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Rethinking Feature Mining for Light Field Salient Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367696720:10(1-24)Online publication date: 8-Jul-2024
    • (2024)Illumination-Aware Low-Light Image Enhancement with Transformer and Auto-Knee CurveACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366465320:8(1-23)Online publication date: 29-Jun-2024
    • (2024)Divide-and-conquer-based RDO-free CU Partitioning for 8K Video CompressionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363470520:4(1-20)Online publication date: 11-Jan-2024
    • (2024)Learning to Hallucinate Face in the DarkIEEE Transactions on Multimedia10.1109/TMM.2023.329480826(2314-2326)Online publication date: 1-Jan-2024
    • (2024)Open-Source Projects for 3D Point CloudsDeep Learning for 3D Point Clouds10.1007/978-981-97-9570-3_9(255-272)Online publication date: 10-Oct-2024
    • (2024)Point Cloud-Language Multi-modal LearningDeep Learning for 3D Point Clouds10.1007/978-981-97-9570-3_8(227-254)Online publication date: 10-Oct-2024
    • (2024)Point Cloud Pre-trained Models and Large ModelsDeep Learning for 3D Point Clouds10.1007/978-981-97-9570-3_7(195-225)Online publication date: 10-Oct-2024
    • (2024)Deep-Learning-Based Point Cloud Analysis IIDeep Learning for 3D Point Clouds10.1007/978-981-97-9570-3_6(163-193)Online publication date: 10-Oct-2024
    • (2024)Deep-Learning-Based Point Cloud Analysis IDeep Learning for 3D Point Clouds10.1007/978-981-97-9570-3_5(131-162)Online publication date: 10-Oct-2024
    • (2024)Deep-Learning-Based Point Cloud Enhancement IIDeep Learning for 3D Point Clouds10.1007/978-981-97-9570-3_4(99-130)Online publication date: 10-Oct-2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media