[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Spherical Convolution Empowered Viewport Prediction in 360 Video Multicast with Limited FoV Feedback

Published: 05 January 2023 Publication History

Abstract

Field of view (FoV) prediction is critical in 360-degree video multicast, which is a key component of the emerging virtual reality and augmented reality applications. Most of the current prediction methods combining saliency detection and FoV information neither take into account that the distortion of projected 360-degree videos can invalidate the weight sharing of traditional convolutional networks nor do they adequately consider the difficulty of obtaining complete multi-user FoV information, which degrades the prediction performance. This article proposes a spherical convolution-empowered FoV prediction method, which is a multi-source prediction framework combining salient features extracted from 360-degree video with limited FoV feedback information. A spherical convolutional neural network is used instead of a traditional two-dimensional convolutional neural network to eliminate the problem of weight sharing failure caused by video projection distortion. Specifically, salient spatial-temporal features are extracted through a spherical convolution-based saliency detection model, after which the limited feedback FoV information is represented as a time-series model based on a spherical convolution-empowered gated recurrent unit network. Finally, the extracted salient video features are combined to predict future user FoVs. The experimental results show that the performance of the proposed method is better than other prediction methods.

References

[1]
Zhi Liu, Qiyue Li, Xianfu Chen, Celimuge Wu, Susumu Ishihara, Jie Li, and Yusheng Ji. 2021. Point cloud video streaming: Challenges and solutions. IEEE Network 35, 5 (2021), 202–209.
[2]
Jie Li, Cong Zhang, Zhi Liu, Richang Hong, and Han Hu. 2022. Optimal volumetric video streaming with hybrid saliency based tiling. IEEE Transactions on Multimedia. Early access, February 23, 2022.
[3]
Anyue Xu, Xinyu Chen, Yu Liu, and Yumei Wang. 2019. A flexible viewport-adaptive processing mechanism for real-time VR video transmission. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo Workshops (ICMEW’19). 336–341.
[4]
Jie Zhang, Yi Zhong, Yi Han, Dongdong Li, Chenxi Yu, and Junchang Mo. 2020. A 360\(\circ\) video adaptive streaming scheme based on multiple video qualities. In Proceedings of the 2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC’20). IEEE, Los Alamitos, CA, 402–407.
[5]
Ching-Ling Fan, Wen-Chih Lo, Yu-Tung Pai, and Cheng-Hsin Hsu. 2019. A survey on 360 video streaming: Acquisition, transmission, and display. ACM Computing Surveys 52, 4 (2019), 1–36.
[6]
Jie Li, Ransheng Feng, Wei Sun, Zhi Liu, and Qiyue Li. 2020. QoE-driven coupled uplink and downlink rate adaptation for 360-degree video live streaming. IEEE Communications Letters 24, 4 (2020), 863–867.
[7]
Chengjun Guo, Ying Cui, and Zhi Liu. 2018. Optimal multicast of tiled 360 VR video. IEEE Wireless Communications Letters 8, 1 (2018), 145–148.
[8]
Chengjun Guo, Lingzhi Zhao, Ying Cui, Zhi Liu, and Derrick Wing Kwan Ng. 2021. Power-efficient wireless streaming of multi-quality tiled 360 VR video in MIMO-OFDMA systems. IEEE Transactions on Wireless Communications 20, 8 (2021), 5408–5422.
[9]
Omar Eltobgy, Omar Arafa, and Mohamed Hefeeda. 2020. Mobile streaming of live 360-degree videos. IEEE Transactions on Multimedia 22, 12 (2020), 3139–3152.
[10]
Guangtao Zhai and Xiongkuo Min. 2020. Perceptual image quality assessment: A survey. Science China Information Sciences 63, 11 (2020), 211–301.
[11]
Kyunghan Lee, Gaetan Guerrero, Seunghoon Cha, Younghui Kim, and Sungmin Cho. 2017. VR theater, a virtual reality based multi-screen movie theater simulator for verifying multi-screen content and environment. In Proceedings of the SMPTE 2017 Annual Technical Conference and Exhibition. 1–13.
[12]
Moray Rumney (Ed.). 2013. LTE and the Evolution to 4G Wireless: Design and Measurement Challenges. John Wiley & Sons.
[13]
Yanan Bao, Huasen Wu, Tianxiao Zhang, Albara Ah Ramli, and Xin Liu. 2016. Shooting a moving target: Motion-prediction-based transmission for 360-degree videos. In Proceedings of the 2016 IEEE International Conference on Big Data (Big Data’16). IEEE, Los Alamitos, CA, 1161–1170.
[14]
Afshin Taghavi Nasrabadi, Aliehsan Samiei, and Ravi Prakash. 2020. Viewport prediction for 360\(\circ\) videos: A clustering approach. In Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV’20). ACM, New York, NY, 34–39.
[15]
Jinting Tang, Yongkai Huo, Shaoshi Yang, and Jianmin Jiang. 2020. A viewport prediction framework for panoramic videos. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN’20). IEEE, Los Alamitos, CA, 1–8.
[16]
Stefano Petrangeli, Gwendal Simon, and Viswanathan Swaminathan. 2018. Trajectory-based viewport prediction for 360-degree virtual reality videos. In Proceedings of the 2018 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR’18). 157–160.
[17]
Jinyu Chen, Xianzhuo Luo, Miao Hu, Di Wu, and Yipeng Zhou. 2021. Sparkle: User-aware viewport prediction in 360-degree video streaming. IEEE Transactions on Multimedia 23 (2021), 3853–3866.
[18]
Anh Nguyen, Zhisheng Yan, and Klara Nahrstedt. 2018. Your attention is unique: Detecting 360-degree video saliency in head-mounted display for head movement prediction. In Proceedings of the 26th ACM International Conference on Multimedia (MM’18). ACM, New York, NY, 1190–1198.
[19]
Fang-Yi Chao, Lu Zhang, Wassim Hamidouche, and Olivier Deforges. 2018. Salgan360: Visual saliency prediction on 360 degree images with generative adversarial networks. In Proceedings of the 2018 IEEE International Conference on Multimedia Expo Workshops (ICMEW’18). IEEE, Los Alamitos, CA, 1–4.
[20]
Yu-Chuan Su and Kristen Grauman. 2017. Learning spherical convolution for fast features from 360\(\circ\) imagery. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). 529–539.
[21]
Xiao Li, Siyi Wang, Chen Zhu, Li Song, Rong Xie, and Wenjun Zhang. 2019. Viewport prediction for panoramic video with multi-CNN. In Proceedings of the 2019 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB’19). IEEE, Los Alamitos, CA, 1–6.
[22]
Matt Yu, Haricharan Lakshman, and Bernd Girod. 2015. A framework to evaluate omnidirectional video coding schemes. In Proceedings of the 2015 IEEE International Symposium on Mixed and Augmented Reality. IEEE, Los Alamitos, CA, 31–36.
[23]
Qin Yang, Junni Zou, Kexin Tang, Chenglin Li, and Hongkai Xiong. 2019. Single and sequential viewports prediction for 360-degree video streaming. In Proceedings of the 2019 IEEE International Symposium on Circuits and Systems (ISCAS’19). 1–5.
[24]
Johanna Vielhaben, Hüseyin Camalan, Wojciech Samek, and Markus Wenzel. 2019. Viewport forecasting in 360\(\circ\) virtual reality videos with machine learning. In Proceedings of the 2019 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR’19). IEEE, Los Alamitos, CA, 740–747.
[25]
Mohammadreza Jamali, Stéphane Coulombe, Ahmad Vakili, and Carlos Vazquez. 2020. LSTM-based viewpoint prediction for multi-quality tiled video coding in virtual reality streaming. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS’20). IEEE, Los Alamitos, CA, 1–5.
[26]
Lovish Chopra, Sarthak Chakraborty, Abhijit Mondal, and Sandip Chakraborty. 2021. PARIMA: Viewport adaptive 360-degree video streaming. In Proceedings of the 2021 Web Conference (WWW’21). ACM, New York, NY, 2379–2391.
[27]
Liyang Sun, Yixiang Mao, Tongyu Zong, Yong Liu, and Yao Wang. 2020. Flocking-based live streaming of 360-degree video. In Proceedings of the 11th ACM Multimedia Systems Conference (MMSys’20). ACM, New York, NY, 26–37.
[28]
Zhibo Yang, Lihan Huang, Yupei Chen, Zijun Wei, Seoyoung Ahn, Gregory Zelinsky, Dimitris Samaras, and Minh Hoai. 2020. Predicting goal-directed human attention using inverse reinforcement learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 193–202.
[29]
Yanyu Xu, Yanbing Dong, Junru Wu, Zhengzhong Sun, Zhiru Shi, Jingyi Yu, and Shenghua Gao. 2018. Gaze prediction in dynamic 360\(\circ\) immersive videos. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5333–5342.
[30]
Minglang Qiao, Mai Xu, Zulin Wang, and Ali Borji. 2021. Viewport-dependent saliency prediction in 360\(\circ\) video. IEEE Transactions on Multimedia 23 (2021), 748–760.
[31]
Xianglong Feng, Yao Liu, and Sheng Wei. 2020. LiveDeep: Online viewport prediction for live virtual reality streaming using lifelong deep learning. In Proceedings of the 2020 IEEE Conference on Virtual Reality and 3D User Interfaces (VR’20). IEEE, Los Alamitos, CA, 800–808.
[32]
Xinwei Chen, Ali Taleb Zadeh Kasgari, and Walid Saad. 2020. Deep learning for content-based personalized viewport prediction of 360-degree VR videos. IEEE Networking Letters 2, 2 (2020), 81–84.
[33]
Bhishma Dedhia, Jui-Chiu Chiang, and Yi-Fan Char. 2019. Saliency prediction for omnidirectional images considering optimization on sphere domain. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’19). IEEE, Los Alamitos, CA, 2142–2146.
[34]
Youqiang Zhang, Feng Dai, Yike Ma, Hongliang Li, Qiang Zhao, and Yongdong Zhang. 2020. Saliency prediction network for \(360\circ\) videos. IEEE Journal of Selected Topics in Signal Processing 14, 1 (2020), 27–37.
[35]
Pengyu Zhao, Yuanxing Zhang, Kaigui Bian, Hu Tuo, and Lingyang Song. 2019. LadderNet: Knowledge transfer based viewpoint prediction in 360\(\circ\) video. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’19). IEEE, Los Alamitos, CA, 1657–1661.
[36]
Ching-Ling Fan, Shou-Cheng Yen, Chun-Ying Huang, and Cheng-Hsin Hsu. 2020. Optimizing fixation prediction using recurrent neural networks for 360\(^{\circ }\) video streaming in head-mounted virtual reality. IEEE Transactions on Multimedia 22, 3 (2020), 744–759.
[37]
Benjamin Coors, Alexandru Paul Condurache, and Andreas Geiger. 2018. SphereNet: Learning spherical representations for detection and classification in omnidirectional images. In Proceedings of the European Conference on Computer Vision (ECCV’18). 518–533.
[38]
Yucheng Zhu, Guangtao Zhai, and Xiongkuo Min. 2018. The prediction of head and eye movement for 360 degree images. Signal Processing: Image Communication 69 (2018), 15–25.
[39]
Yucheng Zhu, Guangtao Zhai, Xiongkuo Min, and Jiantao Zhou. 2020. The prediction of saliency map for head and eye movements in 360 degree images. IEEE Transactions on Multimedia 22, 9 (2020), 2331–2344.
[40]
Yucheng Zhu, Guangtao Zhai, Xiongkuo Min, and Jiantao Zhou. 2020. Learning a deep agent to predict head movement in 360-degree images. ACM Trans. Multimedia Computing, Communications, and Applications 16, 4 (Dec. 2020), Article 130, 23 pages.
[41]
Stefano Petrangeli, Viswanathan Swaminathan, Mohammad Hosseini, and Filip De Turck. 2017. An HTTP/2-based adaptive streaming framework for 360\(\circ\) virtual reality videos. In Proceedings of the 25th ACM International Conference on Multimedia (MM’17). ACM, New York, NY, 306–314.
[42]
Feng Qian, Bo Han, Qingyang Xiao, and Vijay Gopalakrishnan. 2018. Flare: Practical viewport-adaptive 360-degree video streaming for mobile devices. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking (MobiCom’18). ACM, New York, NY, 99–114.
[43]
Praveen Kumar Yadav and Wei Tsang Ooi. 2020. Tile rate allocation for 360-degree tiled adaptive video streaming. In Proceedings of the 28th ACM International Conference on Multimedia (MM’20). ACM, New York, NY, 3724–3733.
[44]
Xianglong Feng, Viswanathan Swaminathan, and Sheng Wei. 2019. Viewport prediction for live 360-degree mobile video streaming using user-content hybrid motion tracking. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 2 (2019), 1–22.
[45]
Wenguan Wang, Jianbing Shen, Fang Guo, Ming-Ming Cheng, and Ali Borji. 2018. Revisiting video saliency: A large-scale benchmark and a new model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 4894–4903.
[46]
Lai Jiang, Mai Xu, Tie Liu, Minglang Qiao, and Zulin Wang. 2018. DeepVS: A deep learning based video saliency prediction approach. In Proceedings of the European Conference on Computer Vision (ECCV’18). 602–617.
[47]
Zhaohui Che, Ali Borji, Guangtao Zhai, Xiongkuo Min, Guodong Guo, and Patrick Le Callet. 2019. How is gaze influenced by image transformations? Dataset and model. IEEE Transactions on Image Processing 29 (2019), 2287–2300.
[48]
Xiongkuo Min, Guangtao Zhai, Ke Gu, and Xiaokang Yang. 2016. Fixation prediction through multimodal analysis. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 1 (Oct. 2016), Article 6, 23 pages.
[49]
Dandan Zhu, Defang Zhao, Xiongkuo Min, Tian Han, Qiangqiang Zhou, Shaobo Yu, Yongqing Chen, Guangtao Zhai, and Xiaokang Yang. 2021. Lavs: A lightweight audio-visual saliency prediction model. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME’21). IEEE, Los Alamitos, CA, 1–6.
[50]
Marc Eder and Jan-Michael Frahm. 2019. Convolutions on spherical images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 1–5.
[51]
Chuong H. Vo, Jui-Chiu Chiang, Duy H. Le, Thu T. A. Nguyen, and Tuan V. Pham. 2020. Saliency prediction for 360-degree video. In Proceedings of the 2020 5th International Conference on Green Technology and Sustainable Development (GTSD’20). 442–448.
[52]
Sohee Park, Arani Bhattacharya, Zhibo Yang, Samir R. Das, and Dimitris Samaras. 2021. Mosaic: Advancing user quality of experience in 360-degree video streaming with machine learning. IEEE Transactions on Network and Service Management 18, 1 (2021), 1000–1015.
[53]
James R. Driscoll and Dennis M. Healy. 1994. Computing Fourier transforms and convolutions on the 2-sphere. Advances in Applied Mathematics 15, 2 (1994), 202–250.
[54]
Taco S. Cohen, Mario Geiger, Jonas Köhler, and Max Welling. 2018. Spherical CNNs. In Proceedings of the International Conference on Learning Representations (ICLR’18).
[55]
Weisi Lin and C.-C. Jay Kuo. 2011. Perceptual visual quality metrics: A survey. Journal of Visual Communication and Image Representation 22, 4 (2011), 297–312.
[56]
Sanghyun Woo, Jongchan Park, Joon-Young Lee, and In So Kweon. 2018. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV’18).
[57]
Ziheng Zhang, Yanyu Xu, Jingyi Yu, and Shenghua Gao. 2018. Saliency detection in 360\(\circ\) videos. In Proceedings of the European Conference on Computer Vision (ECCV’18). 488–503.
[58]
Zoya Bylinskii, Tilke Judd, Aude Oliva, Antonio Torralba, and Frédo Durand. 2019. What do different evaluation metrics tell us about saliency models? IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 3 (2019), 740–757.
[59]
Qianlin Liang, Prashant Shenoy, and David Irwin. 2020. AI on the edge: Rethinking AI-based IoT applications using specialized edge architectures. [arxiv]:cs.DC/2003.12488 (2020).

Cited By

View all
  • (2025)Multiscroll hopfield neural network with extreme multistability and its application in video encryption for IIoTNeural Networks10.1016/j.neunet.2024.106904182(106904)Online publication date: Feb-2025
  • (2024)Long Short-Term Memory-Based Non-Uniform Coding Transmission Strategy for a 360-Degree VideoElectronics10.3390/electronics1316328113:16(3281)Online publication date: 19-Aug-2024
  • (2024)Adaptive Transmission Strategy for Non-Uniform Coding of 360∘ VideosElectronics10.3390/electronics1316326613:16(3266)Online publication date: 17-Aug-2024
  • Show More Cited By

Index Terms

  1. Spherical Convolution Empowered Viewport Prediction in 360 Video Multicast with Limited FoV Feedback

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 1
    January 2023
    505 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3572858
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 January 2023
    Online AM: 12 March 2022
    Accepted: 12 January 2022
    Revised: 12 January 2022
    Received: 09 July 2021
    Published in TOMM Volume 19, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 360-degree video
    2. video multicast
    3. FoV prediction
    4. saliency detection
    5. spherical convolution

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • National Natural Science Foundation of China
    • Anhui Provincial Natural Science Foundation
    • Fundamental Research Funds for the Central Universities

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)605
    • Downloads (Last 6 weeks)98
    Reflects downloads up to 13 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Multiscroll hopfield neural network with extreme multistability and its application in video encryption for IIoTNeural Networks10.1016/j.neunet.2024.106904182(106904)Online publication date: Feb-2025
    • (2024)Long Short-Term Memory-Based Non-Uniform Coding Transmission Strategy for a 360-Degree VideoElectronics10.3390/electronics1316328113:16(3281)Online publication date: 19-Aug-2024
    • (2024)Adaptive Transmission Strategy for Non-Uniform Coding of 360∘ VideosElectronics10.3390/electronics1316326613:16(3266)Online publication date: 17-Aug-2024
    • (2024)Enhancing target detection accuracy through cross-modal spatial perception and dual-modality fusionFrontiers in Physics10.3389/fphy.2024.139867812Online publication date: 6-Jun-2024
    • (2024)Vehicle recognition pipeline via DeepSort on aerial image datasetsFrontiers in Neurorobotics10.3389/fnbot.2024.143015518Online publication date: 16-Aug-2024
    • (2024)DFRDRL: a dynamic fuzzy routing algorithm based on deep reinforcement learning with guaranteed latency and bandwidth for software-defined networksJournal of Big Data10.1186/s40537-024-01029-x11:1Online publication date: 28-Oct-2024
    • (2024)SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3664816Online publication date: 11-May-2024
    • (2024)Universal Relocalizer for Weakly Supervised Referring Expression GroundingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365604520:7(1-23)Online publication date: 16-May-2024
    • (2024)Context-detail-aware United Network for Single Image DerainingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363940720:5(1-18)Online publication date: 22-Jan-2024
    • (2024)Emotional Video Captioning With Vision-Based Emotion Interpretation NetworkIEEE Transactions on Image Processing10.1109/TIP.2024.335904533(1122-1135)Online publication date: 1-Feb-2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media