[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Optimizing Immersive Video Coding Configurations Using Deep Learning: A Case Study on TMIV

Published: 27 January 2022 Publication History

Abstract

Immersive video streaming technologies improve Virtual Reality (VR) user experience by providing users more intuitive ways to move in simulated worlds, e.g., with 6 Degree-of-Freedom (6DoF) interaction mode. A naive method to achieve 6DoF is deploying cameras at numerous different positions and orientations that may be required based on users’ movement, which unfortunately is expensive, tedious, and inefficient. A better solution for realizing 6DoF interactions is to synthesize target views on-the-fly from a limited number of source views. While such view synthesis is enabled by the recent Test Model for Immersive Video (TMIV) codec, TMIV dictates manually-composed configurations, which cannot exercise the tradeoff among video quality, decoding time, and bandwidth consumption. In this article, we study the limitation of TMIV and solve its configuration optimization problem by searching for the optimal configuration in a huge configuration space. We first identify the critical parameters in the TMIV configurations. Then, we introduce two Neural Network (NN)-based algorithms from two heterogeneous aspects: (i) a Convolutional Neural Network (CNN) algorithm solving a regression problem and (ii) a Deep Reinforcement Learning (DRL) algorithm solving a decision making problem, respectively. We conduct both objective and subjective experiments to evaluate the CNN and DRL algorithms on two diverse datasets: an equirectangular and a perspective projection dataset. The objective evaluations reveal that both algorithms significantly outperform the default configurations. In particular, with the equirectangular (perspective) projection dataset, the proposed algorithms only require 95% (23%) decoding time, stream 79% (23%) views, and improve the utility by 6% (73%) on average. The subjective evaluations confirm the proposed algorithms consume fewer resources while achieving comparable Quality of Experience (QoE) than the default and the optimal TMIV configurations.

References

[1]
S. Altamimi and S. Shirmohammadi. 2020. QoE-fair DASH video streaming using server-side reinforcement learning. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 2s (2020), 68:1–68:21.
[2]
S. Avidan and A. Shashua. 1997. Novel view synthesis in tensor space. In Proc. of IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’97). 1034–1040.
[3]
A. Bentaleb, B. Taani, A. Begen, C. Timmerer, and R. Zimmermann. 2019. A survey on bitrate adaptation schemes for streaming media over HTTP. IEEE Communications Surveys Tutorials 21, 1 (2019), 562–585.
[4]
Y. Chang, K. Chen, C. Wu, C. Ho, and C. Lei. 2010. Online game QoE evaluation using paired comparisons. In Proc. of IEEE International Workshop Technical Committee on Communications Quality and Reliability (CQR’10). 1–6.
[5]
S. Chen and L. Williams. 1993. View interpolation for image synthesis. In Proc. of ACM Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’93). 279–288.
[6]
B. Cheng, J. Yang, S. Wang, and J. Chen. 2015. Adaptive video transmission control system based on reinforcement learning approach over heterogeneous networks. IEEE Transactions on Automation Science and Engineering 12, 3 (2015), 1104–1113.
[7]
F. Chiariotti, S. D’Aronco, L. Toni, and P. Frossard. 2016. Online learning adaptation strategy for DASH clients. In Proc. of ACM International Conference on Multimedia Systems (MMSys’16). 8:1–8:12.
[8]
X. Corbillon, F. Simone, G. Simon, and P. Frossard. 2018. Dynamic adaptive streaming for multi-viewpoint omnidirectional videos. In Proc. of ACM International Conference on Multimedia Systems Conference (MMSys’18). 237–249.
[9]
L. Costero, A. Iranfar, M. Zapater, F. Igual, K. Olcoz, and D. Atienza. 2019. MAMUT: Multi-agent reinforcement learning for efficient real-time multi-user video transcoding. In Proc. of IEEE Design, Automation Test in Europe Conference Exhibition (DATE’19). 558–563.
[10]
R. Doré and G. Lafruit. 2018. Updated Call for Test Materials for 3DoF+ Visual. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG2018/N17617. (2018).
[11]
A. Dziembowski, J. Samelak, and M. Domański. 2018. View selection for virtual view synthesis in free navigation systems. In Proc. of IEEE International Conference on Signals and Electronic Systems (ICSES’18). 83–87.
[12]
C. Fan, W. Lo, Y. Pai, and C. Hsu. 2019. A survey on 360\(^{\circ }\) video streaming: Acquisition, transmission, and display. Comput. Surveys 52, 4 (2019), 71:1–71:36.
[13]
J. Fleureau, B. Chupeau, F. Thudor, G. Briand, T. Tapie, and R. Doré. 2020. An immersive video experience with real-time view synthesis leveraging the upcoming MIV distribution standard. In 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). 1–2.
[14]
J. Fu, X. Chen, Z. Zhang, S. Wu, and Z. Chen. 2019. 360SRL: A sequential reinforcement learning approach for ABR tile-based 360 video streaming. In Proc. of IEEE International Conference on Multimedia and Expo (ICME’19). 290–295.
[15]
M. Gadaleta, F. Chiariotti, M. Rossi, and A. Zanella. 2017. D-DASH: A deep Q-learning framework for DASH video streaming. IEEE Transactions on Cognitive Communications and Networking 3, 4 (2017), 703–718.
[16]
G. Gescheider. 2013. Psychophysics: The Fundamentals. Psychology Press.
[17]
A. Ghosh, V. Aggarwal, and F. Qian. 2017. A rate adaptation algorithm for tile-based 360-degree video streaming. arXiv preprint arXiv:1704.08215 (2017).
[18]
J. Hooft, S. Petrangeli, M. Claeys, J. Famaey, and F. Turck. 2015. A learning-based algorithm for improved bandwidth-awareness of adaptive streaming clients. In Proc. of IFIP/IEEE International Symposium on Integrated Network Management (IM’15). 131–138.
[19]
M. Hosseini, G. Kurillo, S. Etesami, and J. Yu. 2017. Towards coordinated bandwidth adaptations for hundred-scale 3D tele-immersive systems. Multimedia Systems 23, 4 (2017), 421–434.
[20]
HTC VIVE. 2019. HTC VIVE. (2019).Retrieved April 21, 2020 from https://www.vive.com/tw/product/vive.
[21]
J. Hu, W. Peng, and C. Chung. 2017. HEVC/H.265 coding unit split decision using deep reinforcement learning. In Proc. of IEEE International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS’17). 570–575.
[22]
J. Hu, W. Peng, and C. Chung. 2018. Reinforcement learning for HEVC/H.265 intra-frame rate control. In Proc. of IEEE International Symposium on Circuits and Systems (ISCAS’18). 1–5.
[23]
J. Huang, Z. Chen, D. Ceylan, and H. Jin. 2017. 6-DOF VR videos with a single 360-camera. In Proc. of IEEE Virtual Reality Conference (VR’17). 37–44.
[24]
T. Huang, R. Zhang, C. Zhou, and L. Sun. 2018. QARC: Video quality aware rate control for real-time video streaming based on deep reinforcement learning. In Proc. of ACM International Conference on Multimedia (MM’18). 1208–1216.
[25]
J. Jeong, S. Lee, I. Ryu, T. Le, and E. Ryu. 2020. Towards viewport-dependent 6DoF 360 video tiled streaming for virtual reality systems. In Proc. of ACM International Conference on Multimedia (MM’20). 3687–3695.
[26]
X. Jiang, Y. Chiang, Y. Zhao, and Y. Ji. 2018. Plato: Learning-based adaptive streaming of 360-degree videos. In Proc. of IEEE Conference on Local Computer Networks (LCN’18). 393–400.
[27]
J. Jung, B. Kroon, and J. Boyce. 2019. Common Test Conditions for Immersive Video. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG/N18563. (2019).
[28]
L. Kapov, M. Varela, T. Hoßfeld, and K. Chen. 2018. A survey of emerging concepts and challenges for QoE management of multimedia services. ACM Transactions on Multimedia Computing, Communications, and Applications 14, 2s (2018), 29:1–29:29.
[29]
D. Kingma and J. Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations Track Proceedings (poster).
[30]
C. Maxim, L. Steven, F. Jeroen, and D. Filip. 2014. Design and evaluation of a self-learning HTTP adaptive video streaming client. IEEE Communications Letters 18, 4 (2014), 716–719.
[31]
MPEG. 2019. HM 16.16. (2019). Retrieved April 21, 2020 from https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/tags/HM-16.16/.
[32]
MPEG. 2020. Text of ISO/IEC FDIS 23090-5 Visual Volumetric Video-based Coding and Video-based Point Cloud Compression. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG/w19579. (2020).
[33]
MPEG. 2021. Text of ISO/IEC DIS 23090-12 MPEG Immersive Video. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG/w20003. (2021).
[34]
K. Mueller, A. Smolic, K. Dix, P. Merkle, P. Kauff, and T. Wiegand. 2009. View synthesis for advanced 3D video systems. EURASIP Journal on Image and Video Processing (2009), 438148:1–438148:11.
[35]
H. Pang, C. Zhang, F. Wang, J. Liu, and L. Sun. 2019. Towards low latency multi-viewpoint 360\(^{\circ }\) interactive video: A multimodal deep reinforcement learning approach. In Proc. of IEEE Conference on Computer Communications (INFOCOM’19). 991–999.
[36]
R. Placket. 1975. The analysis of permutations. Journal of the Royal Statistical Society. Series C (Applied Statistics) 24, 2 (1975), 193–202.
[37]
ZION Market Research. 2018. Virtual Reality (VR) Market by Hardware and Software for (Consumer, Commercial, Enterprise, Medical, Aerospace and Defense, Automotive, Energy and Others): Global Industry Perspective, Comprehensive Analysis and Forecast, 2016–2022. (2018). Retrieved April 21, 2020 from https://www.zionmarketresearch.com/report/virtual-reality-market.
[38]
B. Salahieh, S. Bhatia, and J. Boyce. 2019. Multi-pass Add-on Tool for Coherent and Complete View Synthesis (US Patent 2019/0320164 A1). (2019).
[39]
B. Salahieh, B. Kroon, J. Jung, and M. Domański. 2019. Test Model 2 for Immersive Video. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG/N18577. (2019).
[40]
B. Salahieh, B. Kroon, J. Jung, and M. Domański. 2019. Test Model 3 for Immersive Video. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG/N18795. (2019).
[41]
B. Salahieh, B. Kroon, J. Jung, and M. Domański. 2019. Test Model for Immersive Video. International Organization for Standardization Meeting Document ISO/IEC JTC1/SC29/WG11 MPEG/N18470. (2019).
[42]
G. Sullivan, J. Ohm, W. Han, and T. Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1649–1668.
[43]
Y. Sun, A. Lu, and L. Yu. 2017. Weighted-to-spherically-uniform quality evaluation for omnidirectional video. IEEE Signal Processing Letters 24, 9 (2017), 1408–1412.
[44]
R. Sutton and A. Barto. 2018. Reinforcement Learning: An Introduction (2 ed.). A Bradford Book.
[45]
D. Tian, P. Lai, P. Lopez, and C. Gomila. 2009. View synthesis techniques for 3D video. In Proc. of SPIE Conference on Applications of Digital Image Processing (ADIP’09). 74430T:1–74430T:11.
[46]
A. Yaqoob, T. Bi, and G. Muntean. 2020. A survey on adaptive 360\(^{\circ }\) video streaming: Solutions, challenges and opportunities. IEEE Communications Surveys Tutorials 22, 4 (2020), 2801–2838.
[47]
Y. Zhang, P. Zhao, K. Bian, Y. Liu, L. Song, and X. Li. 2019. DRL360: 360-degree video streaming with deep reinforcement learning. In Proc. of IEEE Conference on Computer Communications (INFOCOM’19). 1252–1260.

Cited By

View all
  • (2024)Enhancing TMIV Performance Through Proximity-Aware Grouping and Preservation of Small Clusters2024 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP51287.2024.10647454(3375-3381)Online publication date: 27-Oct-2024
  • (2022)Group-Based Adaptive Rendering System for 6DoF Immersive Video StreamingIEEE Access10.1109/ACCESS.2022.320859910(102691-102700)Online publication date: 2022

Index Terms

  1. Optimizing Immersive Video Coding Configurations Using Deep Learning: A Case Study on TMIV

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 1
    January 2022
    517 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3505205
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 January 2022
    Accepted: 01 June 2021
    Revised: 01 May 2021
    Received: 01 December 2020
    Published in TOMM Volume 18, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Virtual reality
    2. augmented reality
    3. extended reality
    4. 3DoF+
    5. 6DoF
    6. head-mounted displays
    7. view synthesis
    8. streaming
    9. optimization

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • Ministry of Science and Technology of Taiwan

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)114
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Enhancing TMIV Performance Through Proximity-Aware Grouping and Preservation of Small Clusters2024 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP51287.2024.10647454(3375-3381)Online publication date: 27-Oct-2024
    • (2022)Group-Based Adaptive Rendering System for 6DoF Immersive Video StreamingIEEE Access10.1109/ACCESS.2022.320859910(102691-102700)Online publication date: 2022

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media