Deep Learning Approaches for Video Compression: A Bibliometric Analysis
<p>Types of compression.</p> "> Figure 2
<p>Applications of video compression.</p> "> Figure 3
<p>Organization of paper.</p> "> Figure 4
<p>Search Strategy.</p> "> Figure 5
<p>Comparative analysis of publications per year.</p> "> Figure 6
<p>Alluvial diagram showing a correlation between authors, years, and source titles of top 20 cited documents.</p> "> Figure 7
<p>Top keywords used in Scopus.</p> "> Figure 8
<p>Category of publication.</p> "> Figure 9
<p>Publishing country: Scopus.</p> "> Figure 10
<p>Publication country: WoS.</p> "> Figure 11
<p>Publishers in Scopus.</p> "> Figure 12
<p>Publishers in WoS.</p> "> Figure 13
<p>Co-occurrence analysis (author keywords).</p> "> Figure 14
<p>Citation analysis of documents.</p> "> Figure 15
<p>Citation analysis of documents.</p> "> Figure 16
<p>Citation analysis by author.</p> "> Figure 17
<p>Bibliographic analysis of documents.</p> "> Figure 18
<p>Title of the publication and citations network visualization.</p> "> Figure 19
<p>Timeline of video compression algorithms.</p> "> Figure 20
<p>Traditional approach used by video codecs.</p> "> Figure 21
<p>Video compression: issues and advantages of DNN approach.</p> "> Figure 22
<p>Timeline for DNN based video compression.</p> "> Figure 23
<p>Video compression technologies.</p> "> Figure 24
<p>Performance metrics for video compression.</p> "> Figure 25
<p>Datasets used in video compression with a year of introduction.</p> "> Figure 26
<p>Challenges in video compression.</p> ">
Abstract
:1. Introduction
Applications of Video Compression
- Video Conferencing: The cloud-based video conferencing platforms such as Microsoft Teams, Zoom, etc., kept educational systems and industries functional in this pandemic. Live sessions or meetings were ongoing continuously for 24 h of the day. High-quality live streaming of data is being transferred through the network. Our network is often not capable of transferring data of original quality to the receiver. Thus, efficient video compression technologies can help us achieve high-quality audio/video transfer through the internet. Moreover, it will help in making the system cross-compatible in a heterogeneous hardware environment. A few approaches were proposed [24,25,26] for compression in video conferencing, but they have their own set of advantages and disadvantages.
- Social Media, Online education platforms, and OTT platforms: Today’s generation spends lots of time on social media. Instagram, Facebook, LinkedIn, YouTube, and WhatsApp are the most widely used social media accounts. General and technical networking, sharing photos and videos, sharing achievement, and funny and entertainment content attract users to use them.
- Surveillance Video Applications: Applications such as smart traffic monitoring systems, drowsiness detection [24], identifying suspicious activities, CCTV [25], etc., can also require a high-quality codec to save data as well as retrieve data from the storage [26]. Maintaining data quality, object detection, object recognition, and semantically preserving objects or activities from videos is essential in such applications so that it will be a challenge to video codec.
- Multidisciplinary Applications: Currently, DL approaches are widely being used in the field of medicines, astronomy [34], security [35], autonomous driving cars [24], IoT [29,30,31,32], etc. In medicine [36,37], various surgeries are being recorded for records, educational purposes, or for future use. Moreover, videos recorded from space are stored for study purposes or may be used in applications relying on location-based services. The number of smart cities is growing. In smart cities, various IoT devices are capturing videos continuously for various purposes. As per a survey by Cisco, there will be around 22 billion cameras in the world by 2022. Storing and processing data in each application mentioned above is very challenging. An efficient codec may fulfill the requirement.
- To conduct a bibliometric study of the various video codecs used for compression;
- To survey various deep learning-based approaches used by codecs for video compression;
- To study various performance metrics and datasets available for the study of video compression;
- To identify various real-time challenges to video codecs and what are future directions to the research.
2. Research Strategy
2.1. Source and Methods
2.2. Data Selection and Extraction
- Query in Scopus:
- Query in Web of Science
2.3. Data Analysis Procedure
- Analysis of documents by year;
- Citation based analysis;
- Top keywords from Scopus and Web of Science;
- Analysis of document type;
- Analysis by geographical area;
- Analysis of publication by source;
- Co-Occurrence analysis (author keywords).
3. Quantitative Analysis
3.1. Analysis of Documents by Year
3.2. Citation Based Analysis
4. Research Virtue
4.1. Top 10 Keywords from Scopus
4.2. Analysis of Document Type
4.3. Analysis of Geographical Area
4.4. Analysis of Publication by Source
4.5. Co-Occurrence Analysis (Author Keywords)
4.6. Citation Analysis of Documents
4.7. Citation Analysis of Source
4.8. Citation Analysis of Author
4.9. Bibliographic Coupling of Documents
4.10. Network Map of Publication Title and Citation
5. Qualitative Analysis
5.1. History of Video Compression
5.2. Traditional Approach
5.3. Issues in the Traditional Approach
- Traditional codecs are hardcoded approaches: All traditional codecs discussed earlier [55,56,57,58,59,60,61] have a static method of performing compression. Since they are specific to the input provided to the process of compression, the completed form will be disturbed when the input experiences any minor changes. Moreover, they require hand-tuning of the parameters that play a crucial role in compression.
- Traditional codecs are not adaptive: Since codecs are designed and programmed for a specific type or set of input, they cannot be used for any other kind of data. We cannot guarantee its performance to a new kind of input. This is one of the huge issues video codecs are facing, although dictionary-based learning provides adaptiveness up to its best extent.
- Further competition is more difficult: Because of the static and non-adaptive nature of the available video codecs, it is becoming tough to compress available data further.
- Current DNN approaches improve the rate-distortion but make the model much slower and robust. Moreover, it requires more memory which limits their practical usage.
- Today, even in the bay area, mobile network is variable. It may also cause problems in the streaming and compression of data. It is also doubtful whether the network will support high-quality data or not.
5.4. Why Artificial Intelligence
- DL algorithms are adaptive: The beauty of DNN algorithms is their adaptiveness to the input. They learn themselves according to the input data. Even though we provide a large volume of data input, DNN algorithms can identify various trends and patterns and provide the maximum possible efficient solution to the problem. They may require extra time to learn, but they provide promising results once they understand the pattern. Moreover, humans do not need to babysit the algorithms in every execution step.
- Learn parameters to optimize compression objective: Hyperparameter tuning is crucial in generating results in DNN algorithms. Several parameters must be set at an optimum value to develop more efficient results. The adaptive nature of the DNN algorithms helps adjust those parameters as per the input given to the algorithm. Thus, programmers need not require manual calculation and manually setting those values, which reduces a significant burden on the programmer’s shoulders.
- Transfer learning: Another exciting advantage DNN algorithms provide is transfer learning. Transfer learning [72] solves problems from different domains using available data and previous experiences. DNN algorithms have a comprehensive set of applications. We can try a trained model from one application to another and see whether it can provide expected results.
- Supports a variety of data: DNN algorithms support multi-dimensional and multi-variety of data. They may use ETL (Extract, transform, load) tools or tools in uncertain or dynamic environments to generate results.
- Continuous Improvement: DNN algorithms become smarter when exposed to a variety of data. They gain experiences from input data and go on improving efficiency and accuracy. Moreover, they help in increasing coding efficiency.
5.5. Proposed Deep Learning Approaches for Video Compression
5.6. Metrics for Performance Measurements
- MSE (Mean Square Error): It is the most common, simplest, and most widely used method of assessment of the quality of the image. This method is also called Mean Squared Deviation (MSD). This method calculates the average of the square of the errors between two images. The following is the detailed formula [120] for MSE or MSD. A value closer to zero is a measure of the excellent quality of the image.MSE between two images such as .
- RMSE (Root Mean Square Error): This is another method to assess the quality of the image. RMSE can be calculated by taking the square root of MSE. It is an accurate estimator of errors between images. The following is the formula [120].
- PSNR (Peak Signal to Noise Ratio): Various processes add noise distortion to the video/image. PSNR [121] measures the maximum possible signal power ratio to the power of distortion noise added to it. It is the most widely used method for assessing the quality of images after lossy compression by the codec. The following is the formula [120]. Here, peakval (Peak Value) is the maximal in the image data. If an 8-bit unsigned integer data type occurs, the peakval is 255.
- SSIM (Structure Similarity Index Method): It is one of the very well-known methods of calculating image degradation [121]. This method finds strong interpixel dependency to find degradation in images. Luminance, contrast, and structure are the factors considered in finding structural similarities between images. Multi-Scale Structural Similarity Index Method (MS-SSIM) is the advanced version of SSIM. It is used to evaluate the various structural similarity of the images of different scales. The size and resolution of images are extra factors considered compared to SSIM. A three-component SSIM (3-SSIM) [122] is a newly proposed method based on Human Visual systems (HVS). A human eye can observe the difference between various textures more efficiently than any system; this advantage is used in this method. We can also calculate dissimilarity between two images; we call it a DSSIM (Structural Dissimilarity). The following is an equation outlining the calculation of SSIM [123] and DSSIM [120].The luminance of the image can be calculated by the following.The contrast of the image can be calculated by the following.The structure of the image can be calculated as follows:If α, β, and γ are equal to 1, then the index is simplified as follows.
- Features Similarity Index Matrix (FSIM): FSIM [124] is an advanced method that maps features from the image first and then finds similarities between two images. The method of mapping features in the picture is called Phase Congruency (PC), and the method of calculating the similarity between two images is called Gradient magnitude (GM).
- Classification Accuracy (CA): It is again one of the measures for the classification of images. This method compares the generated image with the original image and declares how accurate it is. It uses a few sampling methods to do so. Accuracy can be based on data available of the original image, which sometimes may be collected manually, so it may be a time-consuming process.
- Compression Rate (CR): It is a measure that explains what percentage of the original image is compressed without losing essential or important contents/artifacts of the image. It is widely used in applications such as photography, spatiotemporal data, etc.
5.7. Study of Datasets
6. Discussion
6.1. Challenges in Video Compression
- Faster encoders do not guarantee potential compression efficiency: Most codecs try their best to compress, but they do not promise us a potential compression efficiency. Although few can perform good compression levels, they are slower than older codecs. It may be because of the variety and complexity of data being generated by devices today. The other reason may be the data formats of the data; they are changing very quickly. HFR, HDR, 4K, 6K, 8K, 3D, and 360-degree videos are newly evolved challenging formats.
- Encoder search problem: Finding an efficient encoder for data compression is challenging. There are several hurdles in the path that must cross by the encoder. Currently, ML algorithms are extensively being used to reduce the complexity of the encoder. However, we must admit that ML has its advantages and disadvantages.
- Many software encoders can support lower resolutions: Compression is becoming more difficult because of changes in the resolution of the data. It is becoming tough to find redundancy between data and further compression.
- Further compression is more complex: Obtaining an efficient output depends on the changes made in the ML model and the hardware required to run that model. They are time consuming and costly. Input given to the data is not really in the programmer’s hands; it will change in the future, so we need to develop a system that will adapt to the changes and help us do different level compression.
- Deep learning methods are very successful for applications such as image classification. However, the system was found to be very instable when it comes to image restoration. A tiny change occurring on the image results in losing artifacts or important features from the image. It also led to a degradation of the quality of the image. This may occur because of changes in resolution, faulty source equipment, use of inappropriate method for processing the image, etc. This instability makes us think about how much we should rely of these deep learning-based methods.
6.2. Important Findings from the Analysis
- It has been observed that a considerable amount of compression work has been performed for textual data. Since there are a limited number of characters available in every language and no new formats are expected in the future, we can conclude that data compression for textual data has almost reached an end. However, most of the work can be performed on the encryption and decryption parts based on the requirement of an application. However, this is not the case for multimedia data, especially images and videos. After performing a bibliometric quantitative and qualitative studies on images, it has been found that a lot of work is completed or ongoing to achieve efficient image compression. The latest compression work is adapting to new formats evolving for the images. However, when a bibliometric study has been performed for video data, it has been found that a tremendous amount of work is ongoing for video data, and it is trying to match the growth of internet-scaled video traffic.
- International Telecommunication Union (ITU), International Standards Organization (ISO), and International Electrotechnical Commission (IEC) are major organizations that are working in the domain of video compression. MPEG and H.xxx are two families proposed by them. Versatile Video Coding (VVC) is the latest approach proposed by them in 2020. It has good results in live HD streaming and other online platforms.
- Traditional codecs were using a set of transformers (and inverse transform) and quantizers (and dequantizer) for video compression. The main issue with them is the hardcoded approach. It requires hand-tuning the parameters. Moreover, these approaches were static, so they are not adaptive and provide a lower compression rate.
- Using the DNN-based approach is a solution to issues with the traditional approach. They are adaptive, support a variety of data, and provide a promising compression rate. They support transfer learning and show continuous improvement in learning the data and providing results.
- A variety of DNN approaches was used for the image as well as video compression. CNN is a widely used approach. RNN, GAN, encoders, and ensembled methods are the current approaches favored by researchers. They are widely being used in a variety of applications such as OTT, social media, online education, surveillance systems, live streaming of HD data, video conferencing, and various multidisciplinary fields.
- PSNR (Peak Signal to Noise Ratio), SSIM (Structure Similarity Index Method), classification accuracy, and compression rate are metrics used for the performance analysis.
- Many video datasets are freely available to access. CityScapes, DIV2K, UVG, and xiph.org are a few famous datasets that are used by researchers. For applications in healthcare or space or surveillance systems, datasets need to be generated or should be made available by government institutes/organizations for testing purposes.
- The computer vision foundation (CVF) [133] is a nonprofit organization that promotes and supports research in the field of computer vision. It organized three kinds of events named CVPR (Computer Vision and Pattern Recognition), ICCV (International Conference on Computer Vision), and ECCV (European Conference on Computer Vision). Through these events, various workshops, courses, and competitions are organized. It also publishes research in the domain of computer vision. New Trends in Image Restoration and Enhancement (NTIRE) include the workshop and challenges on image and video restoration and enhancement organized by CVPR conferences. Advances in Image Manipulation (AIM) are workshops challenges on the photo and video manipulation organized by ECCV conferences.
- Alliance of Open Media (AOM) [134] is a famous organization that developed AV1 and AVIF. It has started investigating next-generation compression technology. It has challenged the world to design codecs beyond AV1.
- Stanford Compression Forum (SCF) [33] is a research group that extensively supports and promotes research in data compression. A group of researchers from Stanford University started this forum. This forum aims to transform academic research into technology or timely research problems or provide training in the field of data compression. “Stanford Compression Workshop 2021” is the latest event organized by this forum in February 2021.
6.3. Future Directions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Bulao, J. How Much Data Is Created Every Day in 2021? Available online: https://techjury.net/blog/how-much-data-is-created-every-day/ (accessed on 1 November 2021).
- Munson, B. Video Will Account for 82% of All Internet Traffic by 2022, Cisco Says. Available online: https://www.fiercevideo.com/video/video-will-account-for-82-all-internet-traffic-by-2022-cisco-says (accessed on 2 November 2018).
- Cisco Inc. Cisco Annual Internet Report (2018–2023). Available online: https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html (accessed on 9 March 2020).
- Wallace, G.K. The JPEG Still Picture Compression Standard. IEEE Trans. Consum. Electron. 1991, 38, 43–59. Available online: https://jpeg.org/jpeg/software.html (accessed on 2 November 2021). [CrossRef]
- Rabbani, M.; Joshi, R. An overview of the JPEG 2000 still image compression standard. Signal Process. Image Commun. 2002, 17, 3–48. [Google Scholar] [CrossRef]
- Sikora, T. The MPEG-4 Video Standard Verification Model. IEEE Trans. Circuits Syst. Video Technol. 1997, 7, 19–31. [Google Scholar] [CrossRef] [Green Version]
- Duan, L.Y.; Huang, T.; Gao, W. Overview of the MPEG CDVS Standard. In Proceedings of the 2015 Data Compression Conference, Snowbird, UT, USA, 7–9 April 2015; pp. 323–332. [Google Scholar] [CrossRef]
- Brandenburg, K. AAC Explained MP3 and AAC Explained. 1999. Available online: http://www.searchterms.com (accessed on 4 January 2022).
- WinZip Computing, Inc. Homepage. Available online: http://www.winzip.com/ (accessed on 2 March 2004).
- Deutsch, P. GZIP File Format Specification, version 4.3. RFC1952. 1996; pp. 1–12. [Google Scholar] [CrossRef]
- Pu, I.M. Fundamentals of Data Compression; Elsevier: Amsterdam, The Netherlands, 2005. [Google Scholar]
- Salomon, D. Data Compression: The Complete Reference; Springer: London, UK, 2007. [Google Scholar]
- Nelson, M. The Data Compression Book; M & T Books: New York, NY, USA, 1991. [Google Scholar]
- Khalid, S. Introduction to Data Compression; Morgan Kaufmann: Burlington, VT, USA, 2017. [Google Scholar]
- Wei, W.-Y. An Introduction to Image Compression. Master’s Thesis, National Taiwan University, Taipei, Taiwan, 2008. [Google Scholar]
- David, S. A Concise Introduction to Data Compression; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
- Johnson, P.D., Jr.; Harris, G.A. Introduction to Information Theory and Data Compression; CRC Press: Boca Raton, FL, USA, 2003. [Google Scholar]
- Blelloch, G.E. Introduction to Data Compression. Available online: https://www.cs.cmu.edu/~guyb/realworld/compression.pdf (accessed on 31 January 2013).
- Huffmant, D.A. A Method for the Construction of Minimum-Redundancy Codes. Proc. IRE 1952, 40, 1098–1101. [Google Scholar] [CrossRef]
- Rissanen, J.; Langdon, G. Arithmetic coding. IBM J. Res. Dev. 1979, 23, 149–162. [Google Scholar] [CrossRef] [Green Version]
- Choudhary, S.M.; Patel, A.S.; Parmar, S.J. Study of LZ77 and LZ78 Data Compression Techniques. Int. J. Eng. Sci. Innov. Technol. 2015, 4, 45–49. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Jabbar, R.; Al-Khalifa, K.; Kharbeche, M.; Alhajyaseen, W.; Jafari, M.; Jiang, S. Real-time Driver Drowsiness Detection for Android Application Using Deep Neural Networks Techniques. Procedia Comput. Sci. 2018, 130, 400–407. [Google Scholar] [CrossRef]
- Varalakshmi, I.; Mahalakshmi, A.; Sriharini, P. Performance Analysis of Various Machine Learning Algorithm for Fall Detection-A Survey. In Proceedings of the 2020 International Conference on System, Computation, Automation and Networking (ICSCAN), Pondicherry, India, 3–4 July 2020; pp. 1–5. [Google Scholar] [CrossRef]
- Bagdanov, A.D.; Bertini, M.; del Bimbo, A.; Seidenari, L. Adaptive Video Compression for Video Surveillance Applications. In Proceedings of the 2011 IEEE International Symposium on Multimedia, Dana Point, CA, USA, 5–7 December 2011; pp. 190–197. [Google Scholar] [CrossRef]
- Lambert, S. Number of Social Media Users in 2022/2023: Demographics & Predictions. Available online: https://financesonline.com/number-of-social-media-users/ (accessed on 15 January 2022).
- Mini Balkrishan. OTT Platform Statistics in India Reveals Promising Growth. Available online: https://selectra.in/blog/ott-streaming-statistics (accessed on 15 January 2022).
- Krishnaraj, N.; Elhoseny, M.; Thenmozhi, M.; Selim, M.; Shankar, K. Deep learning model for real-time image compression in Internet of Underwater Things (IoUT). J. Real-Time Image Process. 2020, 17, 2097–2111. [Google Scholar] [CrossRef]
- Liu, Z.; Liu, T.; Wen, W.; Jiang, L.; Xu, J.; Wang, Y.; Quan, J. DeepN-JPEG. In Proceedings of the 55th Annual Design Automation Conference, San Francisco, CA, USA, 24–29 June 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Azar, J.; Makhoul, A.; Couturier, R.; Demerjian, J. Robust IoT time series classification with data compression and deep learning. Neurocomputing 2020, 398, 222–234. [Google Scholar] [CrossRef]
- Park, J.; Park, H.; Choi, Y.-J. Data compression and prediction using machine learning for industrial IoT. In Proceedings of the 2018 International Conference on Information Networking (ICOIN), Chiang Mai, Thailand, 10–12 January 2018; pp. 818–820. [Google Scholar] [CrossRef]
- Stanford Compression Forum. Available online: https://compression.stanford.edu/ (accessed on 15 January 2022).
- Wang, J.; Shao, Z.; Huang, X.; Lu, T.; Zhang, R.; Lv, X. Spatial–temporal pooling for action recognition in videos. Neurocomputing 2021, 451, 265–278. [Google Scholar] [CrossRef]
- Herrero, A.; Corchado, E.; Gastaldo, P.; Picasso, F.; Zunino, R. Auto-Associative Neural Techniques for Intrusion Detection Systems. In Proceedings of the 2007 IEEE International Symposium on Industrial Electronics, Vigo, Spain, 4–7 June 2007; pp. 1905–1910. [Google Scholar] [CrossRef] [Green Version]
- Merali, Z.; Wang, J.Z.; Badhiwala, J.H.; Witiw, C.D.; Wilson, J.R.; Fehlings, M.G. A deep learning model for detection of cervical spinal cord compression in MRI scans. Sci. Rep. 2021, 11, 10473. [Google Scholar] [CrossRef] [PubMed]
- Ghamsarian, N.; Amirpourazarian, H.; Timmerer, C.; Taschwer, M.; Schöffmann, K. Relevance-Based Compression of Cataract Surgery Videos Using Convolutional Neural Networks. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, DC, USA, 12–16 October 2020; pp. 3577–3585. [Google Scholar] [CrossRef]
- Donthu, N.; Kumar, S.; Mukherjee, D.; Pandey, N.; Lim, W.M. How to conduct a bibliometric analysis: An overview and guidelines. J. Bus. Res. 2021, 133, 285–296. [Google Scholar] [CrossRef]
- Ebrahim, N.A.; Salehi, H.; Embi, M.A.; Habibi, F.; Gholizadeh, H.; Motahar, S.M.; Ordi, A. Effective strategies for increasing citation frequency. Int. Educ. Stud. 2013, 6, 93–99. [Google Scholar] [CrossRef] [Green Version]
- Donthu, N.; Kumar, S.; Pandey, N.; Lim, W.M. Research Constituents, Intellectual Structure, and Collaboration Patterns in Journal of International Marketing: An Analytical Retrospective. J. Int. Mark. 2021, 29, 1–25. [Google Scholar] [CrossRef]
- Scopus Database. Available online: https://www.scopus.com/home.uri (accessed on 15 January 2022).
- Web of Science. Available online: https://www.webofscience.com/wos/alldb/basic-search (accessed on 15 January 2022).
- Ding, D.; Ma, Z.; Chen, D.; Chen, Q.; Liu, Z.; Zhu, F. Advances in Video Compression System Using Deep Neural Network: A Review and Case Studies. Proc. IEEE 2021, 109, 1494–1520. [Google Scholar] [CrossRef]
- Ma, S.; Zhang, X.; Jia, C.; Zhao, Z.; Wang, S.; Wang, S. Image and Video Compression with Neural Networks: A Review. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 1683–1698. [Google Scholar] [CrossRef] [Green Version]
- Van Eck, N.J.; Waltman, L. Software survey: VOS viewer, a computer program for bibliometric mapping. Scientometrics 2010, 84, 523–538. [Google Scholar] [CrossRef] [Green Version]
- Bokhare, A.; Metkewar, P.S. Visualization and Interpretation of Gephi and Tableau: A Comparative Study. In Advances in Electrical and Computer Technologies; Springer: Singapore, 2021; pp. 11–23. [Google Scholar] [CrossRef]
- Persson, O.; Danell, R.; Schneider, J.W. How to use Bibexcel for various types of bibliometric analysis. Int. Soc. Scientometr. Informetr. 2009, 5, 9–24. [Google Scholar]
- Lu, G.; Zhang, X.; Ouyang, W.; Chen, L.; Gao, Z.; Xu, D. DVC: An End-to-End Learning Framework for Video Compression. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 3292–3308. [Google Scholar] [CrossRef]
- Gelenbe, E.; Sungur, M.; Cramer, C.; Gelenbe, P. Traffic and video quality with adaptive neural compression. Multimed. Syst. 1996, 4, 357–369. [Google Scholar] [CrossRef]
- Chen, T.; Liu, H.; Shen, Q.; Yue, T.; Cao, X.; Ma, Z. DeepCoder: A deep neural network-based video compression. In Proceedings of the 2017 IEEE Visual Communications and Image Processing, VCIP, St. Petersburg, FL, USA, 10–13 December 2017; pp. 1–4. [Google Scholar] [CrossRef]
- Djelouah, A.; Campos, J.; Schaub-Meyer, S.; Schroers, C. Neural Inter-Frame Compression for Video Coding. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 6420–6428. [Google Scholar] [CrossRef]
- Afonso, M.; Zhang, F.; Bull, D.R. Video Compression Based on Spatio-Temporal Resolution Adaptation. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 275–280. [Google Scholar] [CrossRef]
- Kaplanyan, A.S.; Sochenov, A.; Leimkühler, T.; Okunev, M.; Goodall, T.; Rufo, G. DeepFovea: Neural reconstruction for foveated rendering and video compression using learned statistics of natural videos. ACM Trans. Graph. 2019, 38, 212. [Google Scholar] [CrossRef] [Green Version]
- Cramer, C. Neural networks for image and video compression: A review. Eur. J. Oper. Res. 1998, 108, 266–282. [Google Scholar] [CrossRef]
- ITU-T Recommendation H.261. Available online: https://www.ic.tu-berlin.de/fileadmin/fg121/Source-Coding_WS12/selected-readings/14_T-REC-H.261-199303-I__PDF-E.pdf (accessed on 4 January 2022).
- ISO/IEC 11172-2; (MPEG-1), Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to About 1.5 Mbit/s Part 2: Video. Available online: https://www.iso.org/standard/22411.html (accessed on 4 January 2022).
- Information Technology—Generic Coding of Moving Pictures and Associated Audio Information Part 2: Video, ITU-T Rec. H.262 and ISO/IEC 138182 (MPEG 2 Video). Available online: https://www.sis.se/api/document/preview/916666/ (accessed on 4 January 2022).
- Akramullah, S.M.; Ahmad, I.; Liou, M.L. Optimization of H.263 Video Encoding Using a Single Processor Computer: Performance Tradeoffs and Benchmarking. IEEE Trans. Circuits Syst. Video Technol. 2001, 11, 901–915. [Google Scholar] [CrossRef]
- ISO/IEC 14496-2:1999; Coding of Audio-Visual Objects—Part 2: Visual, ISO/IEC 144962 (MPEG-4 Visual version 1). 1999. Available online: https://www.iso.org/standard/25034.html (accessed on 4 January 2022).
- H.264; ITU-T, Advanced Video Coding for Generic Audio-Visual Services, ITU-T Rec. H.264 and ISO/IEC 14496-10 (AVC). 2003. Available online: https://www.itu.int/rec/T-REC-H.264 (accessed on 4 January 2022).
- Sullivan, G.J.; Ohm, J.R.; Han, W.J.; Wiegand, T. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1649–1668. [Google Scholar] [CrossRef]
- Chiariglione, L.; Timmerer, C. ISO/IEC JTC 1/SC 29/WG 11/N17482; MPEG Press: San Diego, CA, USA, 2018. [Google Scholar]
- Laude, T.; Adhisantoso, Y.G.; Voges, J.; Munderloh, M.; Ostermann, J. A Comprehensive Video Codec Comparison. APSIPA Trans. Signal Inf. Process. 2019, 8, e30. [Google Scholar] [CrossRef] [Green Version]
- Nagabhushana Raju, K.; Ramachandran, S. Implementation of Intrapredictions, Transform, Quantization and CAVLC for H.264 Video Encoder. 2011. Available online: http://www.irphouse.com (accessed on 4 January 2022).
- Tošić, I.; Frossard, P. Dictionary Learning. IEEE Signal Process. Mag. 2011, 28, 27–38. [Google Scholar] [CrossRef]
- Kreutz-Delgado, K.; Murray, J.F.; Rao, B.D.; Engan, K.; Lee, T.-W.; Sejnowski, T.J. Dictionary Learning Algorithms for Sparse Representation. Neural Comput. 2003, 15, 349–396. [Google Scholar] [CrossRef] [Green Version]
- Mairal, J.; Bach, F.; Ponce, J.; Sapiro, G. Online dictionary learning for sparse coding. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML ‘09), Montreal, QC, Canada, 14–18 June 2009; Association for Computing Machinery: New York, NY, USA, 2009; pp. 689–696. [Google Scholar] [CrossRef] [Green Version]
- Sun, L.; Duanmu, F.; Liu, Y.; Wang, Y.; Ye, Y.; Shi, H.; Dai, D. Multi-path multi-tier 360-degree video streaming in 5G networks. In Proceedings of the 9th ACM Multimedia Systems Conference, Amsterdam, The Netherlands, 12–15 June 2018; pp. 162–173. [Google Scholar] [CrossRef]
- Chakareski, J. Adaptive multiview video streaming: Challenges and opportunities. IEEE Commun. Mag. 2013, 51, 94–100. [Google Scholar] [CrossRef]
- Kalva, H.; Christodoulou, L.; Mayron, L.; Marques, O.; Furht, B. Challenges and Opportunities in Video Coding for 3D TV. In Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, Toronto, ON, Canada, 9–12 July 2006; pp. 1689–1692. [Google Scholar] [CrossRef]
- Said, A. Machine learning for media compression: Challenges and opportunities. APSIPA Trans. Signal Inf. Process. 2018, 7, e8. [Google Scholar] [CrossRef] [Green Version]
- Li, J.; Wu, W.; Xue, D. Research on transfer learning algorithm based on support vector machine. J. Intell. Fuzzy Syst. 2020, 38, 4091–4106. [Google Scholar] [CrossRef]
- Johnston, N.; Vincent, D.; Minnen, D.; Covell, M.; Singh, S.; Chinen, T.; Hwang, S.J.; Shor, J.; Toderici, G. Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks. 2018. Available online: https://storage.googleapis.com/compression- (accessed on 4 January 2022).
- Toderici, G.; Vincent, D.; Johnston, N.; Hwang, S.J.; Minnen, D.; Shor, J.; Covell, M. Full Resolution Image Compression with Recurrent Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Toderici, G.; O’Malley, S.M.; Hwang, S.J.; Vincent, D.; Minnen, D.; Baluja, S.; Covell, M.; Sukthankar, R. Variable Rate Image Compression with Recurrent Neural Networks. 2015. Available online: http://arxiv.org/abs/1511.06085 (accessed on 4 January 2022).
- Agustsson, E.; Mentzer, F.; Tschannen, M.; Cavigelli, L.; Timofte, R.; Benini, L.; Van Gool, L. Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations. 2017. Available online: http://arxiv.org/abs/1704.00648 (accessed on 4 January 2022).
- Zhou, L.; Sun, Z.; Wu, X.; Wu, J. End-to-end Optimized Image Compression with Attention Mechanism. In Proceedings of the CVPR Workshops, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Ballé, J.; Minnen, D.; Singh, S.; Hwang, S.J.; Johnston, N. Variational Image Compression with a Scale Hyperprior. 2018. Available online: http://arxiv.org/abs/1802.01436 (accessed on 4 January 2022).
- Agustsson, E.; Tschannen, M.; Mentzer, F.; Timofte Luc Van Gool, R.; Zürich, E. Generative Adversarial Networks for Extreme Learned Image Compression. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Li, M.; Zuo, W.; Gu, S.; Zhao, D.; Zhang, D. Learning Convolutional Networks for Content-weighted Image Compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Ballé, J.; Laparra, V.; Simoncelli, E.P. End-to-End Optimized Image Compression. 2016. Available online: http://arxiv.org/abs/1611.01704 (accessed on 4 January 2022).
- Rippel, O.; Bourdev, L. Real-Time Adaptive Image Compression. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
- Theis, L.; Shi, W.; Cunningham, A.; Huszár, F. Lossy Image Compression with Compressive Autoencoders. 2017. Available online: http://arxiv.org/abs/1703.00395 (accessed on 4 January 2022).
- Liu, D.; Li, Y.; Lin, J.; Li, H.; Wu, F. Deep Learning-Based Video Coding: A Review and A Case Study. Proc. IEEE 2021, 53, 1–35. [Google Scholar] [CrossRef] [Green Version]
- Sangeeta, P.G.; Gill, N.S. Comprehensive Analysis of Flow Incorporated Neural Network-based Lightweight Video Compression Architecture. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 503–508. [Google Scholar]
- Birman, R.; Segal, Y.; Hadar, O. Overview of Research in the field of Video Compression using Deep Neural Networks. Multimed. Tools Appl. 2020, 79, 11699–11722. [Google Scholar] [CrossRef]
- Lu, G.; Ouyang, W.; Xu, D.; Zhang, X.; Gao, Z.; Sun, M.-T. Deep Kalman Filtering Network for Video Compression Artifact Reduction. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Yang, R.; Xu, M.; Wang, Z.; Li, T. Multi-Frame Quality Enhancement for Compressed Video. 2018. Available online: https://github.com/ryangBUAA/MFQE.git (accessed on 4 January 2022).
- Wu, C.-Y. Video Compression through Image Interpolation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Liu, Z.; Yu, X.; Gao, Y.; Chen, S.; Ji, X.; Wang, D. CU Partition Mode Decision for HEVC Hardwired Intra Encoder Using Convolution Neural Network. IEEE Trans. Image Process. 2016, 25, 5088–5103. [Google Scholar] [CrossRef] [PubMed]
- Song, R.; Liu, D.; Li, H.; Wu, F. Neural network-based arithmetic coding of intra prediction modes in HEVC. In Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, FL, USA, 10–13 December 2017. [Google Scholar] [CrossRef] [Green Version]
- Cheng, S.; Dong, Y.; Pang, T.; Su, H.; Zhu, J. Improving Black-box Adversarial Attacks with a Transfer-based Prior. Adv. Neural Inf. Process. Syst. 2020, 10934–10944. [Google Scholar] [CrossRef]
- Wei, X.; Zhu, J.; Su, H. Sparse Adversarial Perturbations for Videos. 2018. Available online: http://arxiv.org/abs/1803.02536 (accessed on 4 January 2022).
- Li, S.; Neupane, A.; Paul, S.; Song, C.; Krishnamurthy, S.V.; Chowdhury, A.K.R.; Swami, A. Adversarial Perturbations against Real-Time Video Classification Systems. arXiv 2018, arXiv:1807.00458. [Google Scholar] [CrossRef]
- Cheng, Y.; Wei, X.; Fu, H.; Lin, S.-W.; Lin, W. Defense for adversarial videos by self-adaptive JPEG compression and optical texture. In Proceedings of the 2nd ACM International Conference on Multimedia in Asia, Singapore, 7 March 2021; pp. 1–7. [Google Scholar] [CrossRef]
- Darwish, S.M.; Almajtomi, A.A.J. Metaheuristic-based vector quantization approach: A new paradigm for neural network-based video compression. Multimed. Tools Appl. 2021, 80, 7367–7396. [Google Scholar] [CrossRef]
- Jia, W.; Li, L.; Li, Z.; Liu, S. Deep Learning Geometry Compression Artifacts Removal for Video-Based Point Cloud Compression. Int. J. Comput. Vis. 2021, 129, 2947–2964. [Google Scholar] [CrossRef]
- Jia, W.; Li, L.; Akhtar, A.; Li, Z.; Liu, S. Convolutional Neural Network-based Occupancy Map Accuracy Improvement for Video-based Point Cloud Compression. IEEE Trans. Multimed. 2021. [Google Scholar] [CrossRef]
- Sangeeta; Gulia, P. Improved Video Compression Using Variable Emission Step ConvGRU Based Architecture. Lect. Notes Data Eng. Commun. Technol. 2021, 61, 405–415. [Google Scholar] [CrossRef]
- Park, W.; Kim, M. Deep Predictive Video Compression Using Mode-Selective Uni- and Bi-Directional Predictions Based on Multi-Frame Hypothesis. IEEE Access 2021, 9, 72–85. [Google Scholar] [CrossRef]
- Sinha, A.K.; Mishra, D. T3D-Y Codec: A Video Compression Framework using Temporal 3-D CNN Encoder and Y-Style CNN Decoder. In Proceedings of the 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, 1–3 July 2020. [Google Scholar] [CrossRef]
- Dhungel, P.; Tandan, P.; Bhusal, S.; Neupane, S.; Shakya, S. An Efficient Video Compression Network. In Proceedings of the IEEE 2020 2nd International Conference on Advances in Computing, Communication Control and Networking, ICACCCN, Greater Noida, India, 18–19 December 2020; pp. 1028–1034. [Google Scholar] [CrossRef]
- Santamaria, M.; Blasi, S.; Izquierdo, E.; Mrak, M. Analytic Simplification of Neural Network Based Intra-Prediction Modes For Video Compression. In Proceedings of the 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), London, UK, 6–10 July 2020; pp. 1–4. [Google Scholar] [CrossRef]
- Zhu, S.; Liu, C.; Xu, Z. High-Definition Video Compression System Based on Perception Guidance of Salient Information of a Convolutional Neural Network and HEVC Compression Domain. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 1946–1959. [Google Scholar] [CrossRef]
- Ma, D.; Zhang, F.; Bull, D.R. GAN-based Effective Bit Depth Adaptation for Perceptual Video Compression. In Proceedings of the 2020 IEEE International Conference on Multimedia and Expo (ICME), London, UK, 6–10 July 2020. [Google Scholar]
- Poyser, M.; Atapour-Abarghouei, A.; Breckon, T.P. On the Impact of Lossy Image and Video Compression on the Performance of Deep Convolutional Neural Network Architectures. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 2830–2837. [Google Scholar] [CrossRef]
- He, G.; Wu, C.; Li, L.; Zhou, J.; Wang, X.; Zheng, Y.; Yu, B.; Xie, W. A Video Compression Framework Using an Overfitted Restoration Neural Network. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 593–597. [Google Scholar] [CrossRef]
- Mameli, F.; Bertini, M.; Galteri, L.; del Bimbo, A. A NoGAN approach for image and video restoration and compression artifact removal. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 9326–9332. [Google Scholar] [CrossRef]
- Feng, R.; Wu, Y.; Guo, Z.; Zhang, Z.; Chen, Z. Learned Video Compression with Feature-level Residuals. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 529–532. [Google Scholar] [CrossRef]
- Chen, W.-G.; Yu, R.; Wang, X. Neural Network-Based Video Compression Artifact Reduction Using Temporal Correlation and Sparsity Prior Predictions. IEEE Access 2020, 8, 162479–162490. [Google Scholar] [CrossRef]
- Liu, D.; Chen, Z.; Liu, S.; Wu, F. Deep Learning-Based Technology in Responses to the Joint Call for Proposals on Video Compression with Capability Beyond HEVC. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 1267–1280. [Google Scholar] [CrossRef]
- Pham, T.T.; Hoang, X.V.; Nguyen, N.T.; Dinh, D.T.; Ha, L.T. End-to-End Image Patch Quality Assessment for Image/Video with Compression Artifacts. IEEE Access 2020, 8, 215157–215172. [Google Scholar] [CrossRef]
- Chen, Z.; He, T.; Jin, X.; Wu, F. Learning for Video Compression. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 566–576. [Google Scholar] [CrossRef] [Green Version]
- Jadhav, A. Variable rate video compression using a hybrid recurrent convolutional learning framework. In Proceedings of the 2020 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 22–24 January 2020. [Google Scholar] [CrossRef]
- Wu, Y.; He, T.; Chen, Z. Memorize, Then Recall: A Generative Framework for Low Bit-rate Surveillance Video Compression. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems, Seville, Spain, 12–14 October 2020. [Google Scholar]
- Lu, G.; Zhang, X.; Ouyang, W.; Xu, D.; Chen, L.; Gao, Z. Deep Non-Local Kalman Network for Video Compression Artifact Reduction. IEEE Trans. Image Process. 2020, 29, 1725–1737. [Google Scholar] [CrossRef]
- Ma, D.; Zhang, F.; Bull, D. Video compression with low complexity CNN-based spatial resolution adaptation. arXiv 2020, arXiv:2007.14726. [Google Scholar] [CrossRef]
- Cao, C.; Preda, M.; Zaharia, T. 3D Point Cloud Compression. In Proceedings of the 24th International Conference on 3D Web Technology, Los Angeles, CA, USA, 26–28 July 2019; pp. 1–9. [Google Scholar] [CrossRef] [Green Version]
- Yu, S.; Sun, S.; Yan, W.; Liu, G.; Li, X. A Method Based on Curvature and Hierarchical Strategy for Dynamic Point Cloud Compression in Augmented and Virtual Reality System. Sensors 2022, 22, 1262. [Google Scholar] [CrossRef]
- Sara, U.; Akter, M.; Uddin, M.S. Image Quality Assessment through FSIM, SSIM, MSE and PSNR—A Comparative Study. J. Comput. Commun. 2019, 7, 8–18. [Google Scholar] [CrossRef] [Green Version]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, C.; Bovik, A.C. Three-Component Weighted Structural Similarity Index. Available online: http://live.ece.utexas.edu/publications/2009/cl_spie09.pdf (accessed on 4 January 2022).
- Brooks, A.C.; Zhao, X.; Pappas, T.N. Structural Similarity Quality Metrics in a Coding Context: Exploring the Space of Realistic Distortions. IEEE Trans. Image Process. 2008, 17, 1261–1273. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kumar, R.; Moyal, V. Visual Image Quality Assessment Technique using FSIM. Int. J. Comput. Appl. Technol. Res. 2013, 2, 250–254. [Google Scholar] [CrossRef]
- Quinlan, J.J.; Zahran, A.H.; Sreenan, C.J. Datasets for AVC (H.264) and HEVC (H.265) evaluation of dynamic adaptive streaming over HTTP (DASH). In Proceedings of the 7th International Conference on Multimedia Systems, Shenzhen, China, 10–13 May 2016; pp. 1–6. [Google Scholar] [CrossRef]
- Feuvre, J.L.; Thiesse, J.-M.; Parmentier, M.; Raulet, M.; Daguet, C. Ultra high definition HEVC DASH data set. In Proceedings of the 5th ACM Multimedia Systems Conference on MMSys ’14, Singapore, 19 March 2014; pp. 7–12. [Google Scholar] [CrossRef]
- Quinlan, J.J.; Sreenan, C.J. Multi-profile ultra-high definition (UHD) AVC and HEVC 4K DASH datasets. In Proceedings of the 9th ACM Multimedia Systems Conference, Amsterdam, The Netherlands, 12–15 June 2018; pp. 375–380. [Google Scholar] [CrossRef]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. 2016. Available online: https://www.cityscapes-dataset.com/wordpress/wp-content/papercite-data/pdf/cordts2016cityscapes.pdf (accessed on 4 January 2022).
- Cordts, M.; Omran, M.; Ramos, S.; Scharwächter, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset. 2015. Available online: https://www.cityscapes-dataset.com/wordpress/wp-content/papercite-data/pdf/cordts2015cvprw.pdf (accessed on 4 January 2022).
- Seeling, P.; Reisslein, M. Video transport evaluation with H.264 video traces. IEEE Commun. Surv. Tutor. 2012, 14, 1142–1165. [Google Scholar] [CrossRef] [Green Version]
- Pulipaka, A.; Seeling, P.; Reisslein, M.; Karam, L.J. Traffic and Statistical Multiplexing Characterization of 3D Video Representation Formats. 2013. Available online: http://trace.eas.asu.edu (accessed on 4 January 2022).
- Seeling, P.; Reisslein, M. Video Traffic Characteristics of Modern Encoding Standards: H.264/AVC with SVC and MVC Extensions and H.265/HEVC. Sci. World J. 2014, 2014, 1–16. [Google Scholar] [CrossRef] [Green Version]
- Mercat, A.; Viitanen, M.; Vanne, J. UVG dataset. In Proceedings of the 11th ACM Multimedia Systems Conference, Istanbul, Turkey, 8–11 June 2020; pp. 297–302. [Google Scholar] [CrossRef]
- Alliance for Open Media. Available online: https://aomedia.org/ (accessed on 4 January 2022).
- Ma, D.; Zhang, F.; Bull, D. BVI-DVC: A Training Database for Deep Video Compression. IEEE Trans. Multimed. 2021, 1. [Google Scholar] [CrossRef]
- Xue, T.; Chen, B.; Wu, J.; Wei, D.; Freeman, W.T. Video Enhancement with Task-Oriented Flow. J. Comput. Vis. 2019, 127, 1106–1125. [Google Scholar] [CrossRef] [Green Version]
- Krovi, R.; Pacht, W.E. Feasibility of self-organization in image compression. In Proceedings of the IEEE/ACM International Conference on Developing and Managing Expert System Programs, Washington, DC, USA, 30 September–2 October 1991; pp. 210–214. [Google Scholar] [CrossRef]
- Gastaldo, P.; Zunino, R.; Rovetta, S. Objective assessment of MPEG-2 video quality. J. Electron. Imaging 2002, 11, 365. [Google Scholar] [CrossRef]
- Gastaldo, P.; Rovetta, S.; Zunino, R. Objective quality assessment of MPEG-2 video streams by using CBP neural networks. IEEE Trans. Neural Netw. 2002, 13, 939–947. [Google Scholar] [CrossRef]
- The Computer Vision Foundation. Available online: https://www.thecvf.com/ (accessed on 4 January 2022).
Fundamental Keyword | Video Compression |
---|---|
Primary Keyword Using “AND” | Neural Networks |
Secondary Keywords Using “OR” | “GAN”, “Generative Adversarial Network”, “CNN”, “Convolutional Neural Network” |
Author’s Keywords Using “OR” | “Video Compression”, “Compression” |
Year | <2017 | 2017 | 2018 | 2019 | 2020 | 2021 | Total |
---|---|---|---|---|---|---|---|
Scopus Citation | 125 | 8 | 13 | 35 | 151 | 210 | 542 |
Web of Science Citation | 18 | 1 | 3 | 21 | 57 | 88 | 188 |
References and Years | Authors | <2017 | 2017 | 2018 | 2019 | 2020 | 2021 | Total |
---|---|---|---|---|---|---|---|---|
[48] (2021) | Lu G. et al. | 0 | 0 | 0 | 4 | 39 | 34 | 77 |
[49] (1996) | Gelenbe E. st al. | 50 | 2 | 8 | 0 | 2 | 3 | 65 |
[44] (2020) | Ma S. at al. | 0 | 0 | 0 | 2 | 11 | 27 | 40 |
[50] (2018) | Chen T. et al. | 0 | 0 | 1 | 9 | 16 | 14 | 40 |
[51] (2019) | Djelouah A. et al. | 0 | 0 | 0 | 1 | 17 | 13 | 31 |
References and Years | Authors | <2017 | 2017 | 2018 | 2019 | 2020 | 2021 | Total |
---|---|---|---|---|---|---|---|---|
[44] (2020) | Ma. S. et al. | 0 | 0 | 0 | 5 | 12 | 27 | 44 |
[52] (2019) | Afonso, Mariana, et al. | 0 | 0 | 0 | 8 | 9 | 7 | 24 |
[50] (2018) | Chen T. et al. | 0 | 0 | 1 | 4 | 9 | 8 | 22 |
[53] (2019) | Kaplanyan, Anton et al. | 0 | 0 | 0 | 0 | 11 | 10 | 21 |
[54] (1998) | Cramer C. | 9 | 0 | 1 | 0 | 1 | 0 | 11 |
Type of Publication | Scopus | Web of Science | Total |
---|---|---|---|
Conference Paper | 44 | 16 | 60 |
Article/Journal | 29 | 18 | 47 |
Review | 00 | 02 | 02 |
Book Chapter | 11 | 00 | 11 |
84 | 37 | 121 |
Country | Count |
---|---|
China | 10 |
USA | 8 |
Australia | 3 |
Italy | 3 |
South Korea | 3 |
Egypt | 2 |
India | 2 |
Country | Count |
---|---|
China | 21 |
USA | 12 |
India | 10 |
UK | 9 |
Italy | 5 |
Poland | 4 |
Australia | 3 |
Keyword | Occurrence | Number of Links | Total Link Strength (TLS) |
---|---|---|---|
Video Compression | 40 | 100 | 144 |
Deep Learning | 15 | 46 | 64 |
Convolutional Neural Network/s (CNN) | 11 | 45 | 45 |
Neural Network/s | 11 | 29 | 42 |
High-Efficiency Video Coding(HEVC) | 10 | 42 | 49 |
Video Coding | 7 | 26 | 34 |
Image Compression | 6 | 17 | 25 |
Deep Neural Network | 4 | 9 | 13 |
Rate distortion optimization | 3 | 12 | 16 |
Image Processing | 3 | 9 | 10 |
Image Coding | 3 | 9 | 10 |
Cellular Neural Networks | 3 | 6 | 8 |
Image/Video Compression | 3 | 17 | 7 |
Encoding | 2 | 17 | 18 |
Transform coding | 2 | 15 | 17 |
HD video | 2 | 10 | 13 |
Spatiotemporal Saliency | 2 | 10 | 13 |
Compression Artifact reduction | 2 | 8 | 8 |
Discrete Cosine Transform | 2 | 7 | 8 |
Effective bit depth adaptation | 2 | 7 | 8 |
Document Author | Citations | Links |
---|---|---|
Lu G. (2019) | 75 | 0 |
Gelenbe E. (1996) | 65 | 2 |
Ma S. (2020) | 37 | 4 |
Chen T. (2018) | 37 | 0 |
Djelouah A. (2019) | 27 | 4 |
Afonso M. (2019) | 27 | 4 |
Kaplanyan A.S. (2019) | 22 | 0 |
Cramer Christopher (1998) | 22 | 0 |
Chen Z. (2020) | 21 | 2 |
Cramer C. (1998) | 20 | 1 |
Xu Y. (2019) | 18 | 0 |
Lu G. (2018) | 11 | 0 |
Source | Documents | Citations | Links | TLS |
---|---|---|---|---|
IEEE transactions on circuits and systems for video technology | 7 | 102 | 7 | 10 |
Lecture Notes in Computer Science | 6 | 23 | 1 | 0 |
IEEE access | 4 | 9 | 1 | 0 |
International conference on image processing, ICIP | 3 | 17 | 1 | 1 |
IEEE international conference on computer vision | 2 | 47 | 0 | |
IEEE computer society conference on computer vision and pattern recognition | 1 | 75 | 1 | 0 |
Multimedia systems | 1 | 65 | 2 | 2 |
IEEE visual communications and image processing, VCIP 2017 | 1 | 37 | 1 | 0 |
ACM transactions on graphics | 1 | 22 | 1 | 0 |
IEEE potentials | 1 | 22 | 1 | 0 |
European journal of operational research | 1 | 20 | 1 | 1 |
International workshop on neural networks for identification, control, robotics, and signal/image processing, NICROSP | 1 | 8 | 1 | 0 |
Name of Author | Documents | Citations | Links | TLS |
---|---|---|---|---|
Zhang X. | 5 | 131 | 19 | 19 |
Gao Z. | 4 | 94 | 4 | 4 |
Lu G. | 4 | 94 | 4 | 4 |
Ouyang W. | 4 | 94 | 4 | 4 |
Xu D. | 4 | 94 | 4 | 4 |
Bull D.R. | 4 | 40 | 9 | 18 |
Zhang F. | 4 | 40 | 9 | 18 |
Cramer C. | 2 | 85 | 6 | 7 |
gelenbe E. | 2 | 68 | 11 | 12 |
Cai C. | 1 | 75 | 0 | 0 |
Gelenbe P. | 1 | 65 | 5 | 5 |
Sungur M. | 1 | 65 | 5 | 5 |
Document | Citations | Links | TLS |
---|---|---|---|
Lu G. (2019) | 75 | 39 | 109 |
Gelenbe E. (1996) | 65 | 9 | 18 |
Ma S. (2020) | 3 | 40 | 156 |
Chen T. (2018) | 37 | 27 | 36 |
Djelouah A. (2019) | 29 | 10 | 26 |
Afonso M. (2019) | 27 | 22 | 31 |
Kaplanyan A.S. (2019) | 22 | 15 | 22 |
Cramer Christopher (1998) | 42 | 6 | 18 |
Chen Z. (2020) | 21 | 38 | 108 |
Xu Y. (2019) | 18 | 27 | 70 |
U G. (2018) | 11 | 33 | 97 |
Soh J.W. (2018) | 9 | 33 | 111 |
Title | PageRank |
---|---|
Overview of the High-Efficiency Video Coding (HEVC) Standard (2012) | 0.003829 |
Adam: A Method for Stochastic Optimization (2014) | 0.003235 |
Image Quality Assessment: From Error Visibility to Structural Similarity (2004) | 0.002478 |
HEVC Deblocking Filter (2012) | 0.002395 |
Sample Adaptive Offset in the HEVC Standard (2012) | 0.002395 |
Title | Eccentricity |
---|---|
Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting (2015) | 7 |
Iterative Procedures for Reduction of Blocking Effects in Transform Image Coding (1992) | 7 |
Characterizing Perceptual Artifacts in Compressed Video Streams (2014) | 7 |
Multi-Frame Quality Enhancement for Compressed Video (2018) | 7 |
Image Restoration by Estimating Frequency Distribution Of Local Patches (2018) | 7 |
Title | Betweenness Centrality |
---|---|
Image Quality Assessment: From Error Visibility to Structural Similarity (2004) | 13,624.71111 |
Overview of The High-Efficiency Video Coding (HEVC) Standard (2012) | 12,780.45105 |
Compression Artifact Reduction by Overlapped-Block Transform Coefficient Estimation with Block Similarity (2013) | 10,800 |
Adam: A Method For Stochastic Optimization (2014) | 10,625.44351 |
Neural Network Approaches To Image Compression (1995) | 8439 |
Title | Eigen Centrality |
---|---|
Overview of the High-Efficiency Video Coding (HEVC) Standard (2012) | 1 |
HEVC Deblocking Filter (2012) | 0.966484 |
Sample Adaptive Offset in The HEVC Standard (2012 | 0.966484 |
Adaptive Loop Filtering for Video Coding (2013) | 0.955361 |
A Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising (2017) | 0.914142 |
Title | Closeness Centrality |
---|---|
A Statistical Evaluation of Recent Full Reference Image Quality Assessment Algorithms (2006) | 1 |
Overview of the High-Efficiency Video Coding (HEVC) Standard (2012) | 1 |
Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting (2015) | 1 |
Non-Local Structure-Based Filter for Video Coding (2015) | 1 |
Interweaved Prediction for Video Coding (2020) | 1 |
Video Compression Algorithm | Family | Year of Introduction | Characteristics |
---|---|---|---|
H.120 | H.xxx | 1984 | First standard by ITU. Used for video conferencing |
H.261 | H.xxx | 1990 | First practical video compression approach. They are used for video transmission over the communication line. |
MPEG-1 | MPEG | 1993 | First compression algorithm by MPEG. They are used in video-CD. Supports audio and video storage on CR-ROMS. |
MPEG-2/H.262 | H.xxx | 1995 | Used in DVD. Supporting HDTV |
H.263 | H.xxx | 1996 | Significant advancement in video streaming and video conferencing. Share subset with MPEG-4 |
MPEG-4 | MPEG | 1999 | Includes DivX and Xvid. Played crucial contributions in the pre-HD era. |
MPEG-4/H.264 | H.xxx | 2003 | It supports Blue-ray, HD DVD, and Digital video broadcasting. Co-published with H.264 |
HEVC/H.265 | H.xxx | 2013 | Live HD streaming of the data. |
VVC | MPEG | 2020 | Live HD streaming, OTT, etc. |
Types of Compression and Proposed Approaches | |
---|---|
Lossy | Lossless/Near Lossless |
Guo lo et al. [48] | Darwish et al. [96] |
Yupeng Chen et al. [95] | Wei Jia et al. [97,98] |
Sangeeta et al. [99] | Ghamsarian, N. et al. [37] |
Woongsung Park et al. [100] | Sinha, A.K. et al. [101] |
Dhungel P et al. [102] | Santamaria M et al. [103] |
Zhu S et al. [104] | Ma D et al. [105] |
Poyser M et al. [106] | He G et al. [107] |
Mameli F et al. [108] | Feng R et al. [109] |
Chen W et al. [110] | Liu D et al. [111] |
Pham T et al. [112] | Chen Z et al. [113] |
Jadhav A et al. [114] | Wu Y et al. [115] |
Lu G et al. [116] | Ma D et al. [117] |
Document | Method of Compression | Dataset Used | Application |
---|---|---|---|
Guo lo et al. [48] 2021 | CNN | UVG, HEVC | OTT, Video Steaming |
Yupeng Chen et al. [95] 2021 | Long-term recurrent convolutional networks (LRCN) | UCF101 | Optical texture preservation in compression |
Darwish et al. [96] 2021 | Differential Pulse Code Modulation (DPCM), Learning Vector Quantization (LVQ) | xiph.org | Video Streaming and transmission |
Wei Jia et al. [98,119] 2021 | Video-Based point cloud compression (V-PCC), CNN | CTC | Point cloud for 3-D object modeling, AR and VR |
Sangeeta et al. [99] 2021 | RNN, CNN | OTT, social media, Storage for online video content | |
Woongsung Park et al. [100] 2021 | CNN | UVG, HEVC-B, HEVC-E | Storage for online video content |
Dhungel P et al. [102] 2020 | DNN | UVG, HEVC | Storage for online video content |
Ghamsarian, N. et al. [37] 2020 | CNN | Medical Dataset- Cataract-101 | Medicine Videos-Cataract Surgery |
Sinha, A.K. et al. [101] 2020 | CNN | UVG, Kinetic 5K | Live streaming. broadcasting |
Santamaria M et al. [103] 2020 | DNN | DIVerse 2K (DIV2K) | Videos with High Resolution |
Ma D et al. [105] 2020 | GAN | HEVC | Spatiotemporal data |
Zhu S et al. [104] 2020 | CNN | HEVC | Spatiotemporal data |
He G et al. [107] 2020 | ORNN | CLIC | CVF Competition |
Feng R et al. [109] 2020 | DNN | Vimeo-90K, CLIC | CVF Competition |
Liu D et al. [111] 2020 | HEVC, VVC | CNN | Real-time videos |
Chen Z et al. [113] 2020 | Flicker | PMCNN | Social Media |
Poyser M et al. [106] 2020 | R-CNN, GAN, encoder | Cityscapes | Real-time videos |
Mameli F et al. [108] 2020 | No-GAN | Real-time videos | |
Wu Y et al. [115] 2020 | RNN, GAN | Surveillance data | Surveillance video applications |
Chen W et al. [110] 2020 | CNN | JCT-VC | HD Videos |
Pham T et al. [112] 2020 | CNN | HMII | Video Streaming and conferencing |
Ma D et al. [117] 2020 | CNN | BVI-DVC | Video Streaming and conferencing |
Jadhav A et al. [114] 2020 | PredEncoder | Youtube Videos | Video Streaming and conferencing |
Lu G et al. [116] 2020 | DNN | Vimeo-90K, HEVC | Video Streaming and conferencing |
Document | RSME | PSNR | MS-SSIM | BD-Rate | CA | CR | Performance |
---|---|---|---|---|---|---|---|
Guo lo et al. [48] 2021 | √ | √ | PSNR gain= 0.61 dB | ||||
Yupeng Chen et al. [95] 2021 | √ | CA = 0.9311 | |||||
Darwish et al. [96] 2021 | √ | √ | CR = 5.94% improvement | ||||
Wei Jia et al. [98,119] 2021 | √ | Significant gain in 3-D artifact removal and time complexity. | |||||
Woongsung Park et al. [100] 2021 | √ | √ | MS-SSIM for HEVC-E class = 0.9958 | ||||
Dhungel P et al. [102] 2020 | √ | √ | for UVG dataset MS-SSIM = 0.980 PSNR = 38 DB | ||||
Ghamsarian, N. et al. [37] 2020 | √ | Up to 68% storage gain | |||||
Sinha, A.K. et al. [101] 2020 | √ | √ | Up to 50% improvement in encoding time | ||||
Santamaria M et al. [103] 2020 | √ | Improvement in BD Rate | |||||
Ma D et al. [105] 2020 | √ | √ | Bit rate saving up to 24.8% | ||||
Zhu S et al. [104] 2020 | √ | √ | √ | 2.59 times higher efficiency than MQP | |||
Feng R et al. [109] 2020 | √ | MS-SSIM = 0.9968 | |||||
Mameli F et al. [108] 2020 | √ | SSIM = 0.5877 | |||||
Wu Y et al. [115] 2020 | √ | √ | MS-SSIM = 0.82, PSNR = 25.69 db | ||||
Chen W et al. [110] 2020 | √ | √ | √ | PSNR = 43 dB MS-SSIM = 0.99 | |||
Pham T et al. [112] 2020 | √ | √ | PSNR Gain = 0.58 dB |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bidwe, R.V.; Mishra, S.; Patil, S.; Shaw, K.; Vora, D.R.; Kotecha, K.; Zope, B. Deep Learning Approaches for Video Compression: A Bibliometric Analysis. Big Data Cogn. Comput. 2022, 6, 44. https://doi.org/10.3390/bdcc6020044
Bidwe RV, Mishra S, Patil S, Shaw K, Vora DR, Kotecha K, Zope B. Deep Learning Approaches for Video Compression: A Bibliometric Analysis. Big Data and Cognitive Computing. 2022; 6(2):44. https://doi.org/10.3390/bdcc6020044
Chicago/Turabian StyleBidwe, Ranjeet Vasant, Sashikala Mishra, Shruti Patil, Kailash Shaw, Deepali Rahul Vora, Ketan Kotecha, and Bhushan Zope. 2022. "Deep Learning Approaches for Video Compression: A Bibliometric Analysis" Big Data and Cognitive Computing 6, no. 2: 44. https://doi.org/10.3390/bdcc6020044
APA StyleBidwe, R. V., Mishra, S., Patil, S., Shaw, K., Vora, D. R., Kotecha, K., & Zope, B. (2022). Deep Learning Approaches for Video Compression: A Bibliometric Analysis. Big Data and Cognitive Computing, 6(2), 44. https://doi.org/10.3390/bdcc6020044