[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article
Free access
Just Accepted

Solutions, Challenges and Opportunities in Volumetric Video Streaming: An Architectural Perspective

Online AM: 21 November 2024 Publication History

Abstract

Volumetric video streaming technologies are the future of immersive media services such as virtual, augmented, and mixed-reality experiences. The challenges surrounding such technologies are tremendous due to the high network bandwidth needed to produce high-quality and low-latency streams. Many techniques and solutions have been proposed across the streaming workflow to mitigate such challenges. To better understand and organize these developments, this survey adopts an architectural framework to showcase current and emerging techniques and solutions for volumetric video streaming while highlighting some of their characteristic challenges and opportunities.

References

[1]
Artificial Intelligence and UK National Security. [Online] Available: https://static.rusi.org/ai_national_security_final_web_version.pdf. Accessed on Jan. 22, 2024.
[2]
Contextualizing Deepfake Threats to Organizations. [Online] Available: https://media.defense.gov/2023/Sep/12/2003298925/-1/-1/0/CSI-DEEPFAKE-THREATS.PDF. Accessed on Jan. 22, 2024.
[3]
In the World of 5G, Virtualization Is Everything. [Online] Available: https://www.sdxcentral.com/5g/definitions/key-elements-5g-network/5g-virtualization/. Accessed on Feb. 9, 2024.
[4]
5G MAG. XR Unity Player. [Online] Available: https://github.com/5G-MAG/rt-xr-unity-player, note = Accessed on Sep. 02, 2024.
[5]
Ahmed Hamza and Xin Wang. ISO/IEC JTC 1/SC 29/WG 3 - CD of ISO/IEC 23090-6 AMD 2 Additional Latency Metrics and Other Improvements. [Online] Available: https://www.mpeg.org/wp-content/uploads/mpeg_meetings/143_Geneva/w22953.zip. Accessed on Sep. 02, 2024.
[6]
A. O. Al-Abbasi, V. Aggarwal, and M.-R. Ra. Multi-tier caching analysis in cdn-based over-the-top video streaming systems. IEEE/ACM Transactions on Networking, 27(2):835–847, 2019.
[7]
E. Alexiou and T. Ebrahimi. Point cloud quality assessment metric based on angular similarity. In 2018 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE, 2018.
[8]
E. Alexiou and T. Ebrahimi. Exploiting user interactivity in quality assessment of point cloud imaging. In IEEE International Conference on Quality of Multimedia Experience (QoMEX), 2019.
[9]
E. Alexiou, I. Viola, T. M. Borges, T. A. Fonseca, R. L. De Queiroz, and T. Ebrahimi. A comprehensive study of the rate-distortion performance in mpeg point cloud compression. APSIPA Transactions on Signal and Information Processing, 8:e27, 2019.
[10]
A. Anwer, S. S. Azhar Ali, A. Khan, and F. Meriaudeau. Underwater 3-D Scene Reconstruction Using Kinect v2 Based on Physical Models for Refraction and Time of Flight Correction. IEEE Access, 5:15960–15970, 2017.
[11]
Apple. Protocol extension for low-latency hls. [Online] Available: https://developer.apple.com/documentation/http_live_streaming/protocol_extension_for_low-latency_hls_preliminary_specification, 2019. Online; accessed Dec. 22 2022.
[12]
BabylonJS. Babylon.js. [Online] Available: https://github.com/BabylonJS/Babylon.js, note = Accessed on Sep. 02, 2024.
[13]
D. Bega, M. Gramaglia, M. Fiore, A. Banchs, and X. Costa-Perez. Deepcog: Cognitive network management in sliced 5g networks with deep learning. In IEEE INFOCOM 2019-IEEE conference on computer communications, pages 280–288. IEEE, 2019.
[14]
T. Bell, B. Li, and S. Zhang. Structured light techniques and applications. Wiley Encyclopedia of Electrical and Electronics Engineering, pages 1–24, 1999.
[15]
A. Bentaleb, B. Taani, A. C. Begen, C. Timmerer, and R. Zimmermann. A survey on bitrate adaptation schemes for streaming media over http. IEEE Communications Surveys & Tutorials, 21(1):562–585, 2018.
[16]
M. Bui, L.-C. Chang, H. Liu, Q. Zhao, and G. Chen. Comparative study of 3d point cloud compression methods. In 2021 IEEE International Conference on Big Data (Big Data), pages 5859–5861. IEEE, 2021.
[17]
C. Cao, M. Preda, and T. Zaharia. 3d point cloud compression: A survey. In The 24th International Conference on 3D Web Technology, 2019.
[18]
K. Cao, Y. Xu, and P. Cosman. Visual quality of compressed mesh and point cloud sequences. IEEE Access, 8:171203–171217, 2020.
[19]
C. Chen, J. Fu, and L. Lyu. A pathway towards responsible ai generated content. arXiv preprint arXiv:2303.01325, 2023.
[20]
K. Chen, C. B. Choy, M. Savva, A. X. Chang, T. Funkhouser, and S. Savarese. Text2shape: Generating shapes from natural language by learning joint embeddings. In Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part III 14, pages 100–116. Springer, 2019.
[21]
Y. Chen, Y. Pan, Y. Li, T. Yao, and T. Mei. Control3d: Towards controllable text-to-3d generation. In Proceedings of the 31st ACM International Conference on Multimedia, pages 1148–1156, 2023.
[22]
Y. Choi, J.-B. Jeong, S. Lee, and E.-S. Ryu. Overview of the video-based dynamic mesh coding (v-dmc) standard work. In 2022 13th International Conference on Information and Communication Technology Convergence (ICTC), pages 578–581. IEEE, 2022.
[23]
S. R. Cox, M. Lim, and W. T. Ooi. Volvqad: an mpeg v-pcc volumetric video quality assessment dataset. In Proceedings of the 14th Conference on ACM Multimedia Systems, pages 357–362, 2023.
[24]
N.-N. Dao, A.-T. Tran, N. H. Tu, T. T. Thanh, V. N. Q. Bao, and S. Cho. A contemporary survey on live video streaming from a computation-driven perspective. ACM Computing Surveys, 2022.
[25]
DASH-IF. DASH Reference Player (dash.js). [Online] Available: https://reference.dashif.org/dash.js/, 2021. Online; accessed on Jan. 22, 2022.
[26]
DASH-IF. Webrtc-based streaming and dash aspects (report). [Online] Available: https://dashif.org/webRTC/report, 2022. Online; accessed Dec. 22 2022.
[27]
DASH-IF and DVB. Low-latency modes for dash. [Online] Available: https://dashif.org/docs/DASH-IF-IOP-CR-Low-Latency-Live-Community-Review.pdf, 2019. Online; accessed Dec. 22 2022.
[28]
David Torres Ocana. Flask pointcloud streamer. [Online] Available: https://github.com/DavidTorresOcana/pointcloud_streamer, note = Accessed on Sep. 02, 2024.
[29]
M. Debbagh. Neural radiance fields (nerfs): A review and some recent developments. arXiv preprint arXiv:2305.00375, 2023.
[30]
DepthKit. Depth Kit. [Online] Available: https://www.depthkit.tv/, note = Accessed on Dec. 15, 2022.
[31]
DepthKit. Depth Kit Documentation. [Online] Available: https://docs.depthkit.tv/?ref=dkhome, note = Accessed on Dec. 15, 2022.
[32]
S. Dijkstra-Soudarissanane, K. E. Assal, S. Gunkel, F. t. Haar, R. Hindriks, J. W. Kleinrouweler, and O. Niamut. Multi-sensor capture and network processing for virtual reality conferencing. In Proceedings of the 10th ACM Multimedia Systems Conference, pages 316–319, 2019.
[33]
R. Diniz, P. G. Freitas, and M. C. Farias. Local luminance patterns for point cloud quality assessment. In IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP), 2020.
[34]
R. Diniz, P. G. Freitas, and M. C. Farias. Color and geometry texture descriptors for point-cloud quality assessment. IEEE Signal Processing Letters, 2021.
[35]
G. Diraco, A. Leone, and P. Siciliano. Human posture recognition with a time-of-flight 3d sensor for in-home applications. Expert Systems with Applications, 40(2):744–751, 2013.
[36]
S. Dost, F. Saud, M. Shabbir, M. G. Khan, M. Shahid, and B. Lovstrom. Reduced reference image and video quality assessments: review of methods. EURASIP Journal on Image and Video Processing, 2022.
[37]
K. Dunkley, A. Dunkley, J. Drewnicki, I. Keith, and J. E. Herbert-Read. A low-cost, long-running, open-source stereo camera for tracking aquatic species and their behaviours. Methods in Ecology and Evolution, 14(10):2549–2556, 2023.
[38]
A. Dziembowski, D. Mieloch, J. Stankowski, and A. Grzelka. Iv-psnr—the objective quality metric for immersive video applications. IEEE Transactions on Circuits and Systems for Video Technology, 32(11):7575–7591, 2022.
[39]
E. d’Eon. ISO/IEC JTC1/SC29 joint WG11/WG1 (MPEG/JPEG) input document WG1M40059/WG1M74006. 8i voxelized full bodies - a voxelized point cloud dataset. [Online] Available: https://unity.com/solutions/film-animation-cinematics. Accessed on Dec. 19, 2022.
[40]
EF-EVE. EF Eye Volumetric Capture (Volcapp). [Online] Available: https://ef-eve.com/, note = Accessed on Dec. 15, 2022.
[41]
ESRI. Limited Error Point Cloud Compression. [Online] Available: https://github.com/Esri/lepcc. Accessed on Dec. 25, 2023.
[42]
ESRI. Limited Error Raster Compression. [Online] Available: https://github.com/Esri/lerc. Accessed on Dec. 25, 2023.
[43]
J.-P. Farrugia, L. Billaud, and G. Lavoué. Adaptive streaming of 3d content for web-based virtual reality: an open-source prototype including several metrics and strategies. In Proceedings of the 14th Conference on ACM Multimedia Systems, pages 430–436, 2023.
[44]
O. Fleisher and S. Anlen. Volume: 3d reconstruction of history for immersive platforms. In ACM SIGGRAPH 2018 Posters, pages 1–2. 2018.
[45]
C. Fu, C. Mertz, and J. M. Dolan. Lidar and monocular camera fusion: On-road depth completion for autonomous driving. In 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pages 273–278. IEEE, 2019.
[46]
V. Gandhi, J. Čech, and R. Horaud. High-resolution depth maps based on tof-stereo fusion. In 2012 IEEE International Conference on Robotics and Automation, pages 4742–4749. IEEE, 2012.
[47]
J. Gené-Mola, J. Llorens, J. R. Rosell-Polo, E. Gregorio, J. Arnó, F. Solanelles, J. A. Martínez-Casasnovas, and A. Escolà. Assessing the performance of rgb-d sensors for 3d fruit crop canopy characterization under different operating and lighting conditions. Sensors, 2020.
[48]
S. Giancola, M. Valenti, and R. Sala. A survey on 3d cameras: Metrological comparison of time-of-flight, structured-light and active stereoscopy technologies. 2018.
[49]
T. Golla and R. Klein. Real-time point cloud compression. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5087–5092. IEEE, 2015.
[50]
Google. draco. [Online] Available: https://github.com/google/draco, note = Accessed on Sep. 02, 2024.
[51]
Google. Google's Draco 3D Data Compression. [Online] Available: https://google.github.io/draco/. Accessed on Dec. 25, 2023.
[52]
D. Graziosi, O. Nakagami, S. Kuma, A. Zaghetto, T. Suzuki, and A. Tabatabai. An overview of ongoing point cloud compression standardization activities: Video-based (v-pcc) and geometry-based (g-pcc). APSIPA Transactions on Signal and Information Processing, 2020.
[53]
GStreamer. What is GStreamer? [Online] Available: https://gstreamer.freedesktop.org/documentation/application-development/introduction/gstreamer.html, 2023. Online; accessed on Feb. 10, 2023.
[54]
S. Gül, D. Podborski, T. Buchholz, T. Schierl, and C. Hellge. Low-latency cloud-based volumetric video streaming using head motion prediction. In ACM NOSSDAV, 2020.
[55]
S. Gül, D. Podborski, T. Buchholz, T. Schierl, and C. Hellge. Low latency volumetric video edge cloud streaming. arXiv preprint arXiv:2001.06466, 2020.
[56]
S. Gül, D. Podborski, J. Son, G. S. Bhullar, T. Buchholz, T. Schierl, and C. Hellge. Cloud rendering-based volumetric video streaming system for mixed reality services. In ACM MMSys, 2020.
[57]
Gul, Serhan. 6DoF Dataset. [Online] Available: https://github.com/serhangul/dataset_6DoF, note = Accessed on Dec. 05, 2022.
[58]
J. Guo, D. Weng, Z. Zhang, Y. Liu, H. B.-L. Duh, and Y. Wang. Subjective and objective evaluation of visual fatigue caused by continuous and discontinuous use of hmds. Journal of the Society for Information Display, 27(2):108–119, 2019.
[59]
B. Han, Y. Liu, and F. Qian. Vivo: visibility-aware mobile volumetric video streaming. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1–13, 2020.
[60]
Y. He, X. Ren, D. Tang, Y. Zhang, X. Xue, and Y. Fu. Density-preserving deep point cloud compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2333–2342, 2022.
[61]
HoloCap. Holo Cap. [Online] Available: https://holocap.com/, note = Accessed on Dec. 15, 2022.
[62]
Y. Huang, B. Bai, Y. Zhu, X. Qiao, X. Su, and P. Zhang. Iscom: Interest-aware semantic communication scheme for point cloud video streaming. arXiv preprint arXiv:2210.06808, 2022.
[63]
IETF. Media over quic: charter-ietf-moq-01. [Online] Available: https://datatracker.ietf.org/doc/charter-ietf-moq/, 2022. Online; accessed Dec. 22 2022.
[64]
Immersiveshooter. Volumetric video is so much more than VR. [Online] Available: https://www.immersiveshooter.com/2019/01/10/volumetric-video-means-so-much-more-than-vr/. Accessed on Jan. 2, 2023.
[65]
Intel. Intel RealSense Technology. [Online] Available: https://www.intel.com/content/www/us/en/architecture-and-technology/realsense-overview.html, note = Accessed on Dec. 05, 2022.
[66]
IO Industries. Products for Volumetric Capture. [Online] Available: https://www.ioindustries.com/volumetric-capture, note = Accessed on Dec. 15, 2022.
[67]
ISO/IEC. ISO/IEC 23000-19:2020 Information technology – Multimedia application format (MPEG-A) – Part 19: Common media application format (CMAF) for segmented media. [Online] Available: https://www.iso.org/standard/79106.html, 2020. Online; accessed on Apr. 23, 2020.
[68]
J. Jansen, S. Subramanyam, R. Bouqueau, G. Cernigliaro, M. M. Cabré, F. Pérez, and P. Cesar. A pipeline for multiparty volumetric video conferencing: transmission of point clouds over low latency dash. In Proceedings of the 11th ACM Multimedia Systems Conference, pages 341–344, 2020.
[69]
A. Javaheri, C. Brites, F. Pereira, and J. Ascenso. A generalized hausdorff distance based quality metric for point cloud geometry. In 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), pages 1–6. IEEE, 2020.
[70]
A. Javaheri, C. Brites, F. Pereira, and J. Ascenso. Point cloud rendering after coding: Impacts on subjective and objective quality. IEEE Transactions on Multimedia, 23:4049–4064, 2020.
[71]
L. Keselman, J. Iselin Woodfill, A. Grunnet-Jepsen, and A. Bhowmik. Intel realsense stereoscopic depth cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1–10, 2017.
[72]
S.-Y. Kim, M. Kim, and Y.-S. Ho. Depth image filter for mixed and noisy pixel removal in rgb-d camera systems. IEEE Transactions on Consumer Electronics, 59(3):681–689, 2013.
[73]
M. Kowalski, J. Naruniec, and M. Daniluk. Livescan3d: A fast and inexpensive 3d data acquisition system for multiple kinect v2 sensors. In 2015 international conference on 3D vision, pages 318–325. IEEE, 2015.
[74]
D. Kreutz, F. M. Ramos, P. E. Verissimo, C. E. Rothenberg, S. Azodolmolky, and S. Uhlig. Software-defined networking: A comprehensive survey. Proceedings of the IEEE, 103(1):14–76, 2014.
[75]
Y. W. Kuan, N. O. Ee, and L. S. Wei. Comparative study of intel R200, Kinect v2, and primesense RGB-D sensors performance outdoors. IEEE Sensors Journal, 19(19):8741–8750, 2019.
[76]
T. F. Lam, H. Blum, R. Siegwart, and A. Gawel. Sl sensor: An open-source, real-time and robot operating system-based structured light sensor for high accuracy construction robotic applications. Automation in Construction, 142:104424, 2022.
[77]
D. Lazzarotto, E. Alexiou, and T. Ebrahimi. Benchmarking of objective quality metrics for point cloud compression. In 2021 IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP), pages 1–6. IEEE, 2021.
[78]
K. Lee, J. Yi, Y. Lee, S. Choi, and Y. M. Kim. Groot: a real-time streaming system of high-fidelity volumetric videos. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1–14, 2020.
[79]
R. Lee, S. I. Venieris, and N. D. Lane. Deep neural network–based enhancement for image and video streaming systems: a survey and future directions. ACM Computing Surveys (CSUR), 2021.
[80]
J. Li, C. Zhang, Z. Liu, R. Hong, and H. Hu. Optimal volumetric video streaming with hybrid saliency based tiling. IEEE Transactions on Multimedia, 25:2939–2953, 2022.
[81]
L. Li, Z. Li, V. Zakharchenko, J. Chen, and H. Li. Advanced 3d motion prediction for video-based dynamic point cloud compression. IEEE Transactions on Image Processing, 29:289–302, 2019.
[82]
X. Li, H. Xiong, X. Li, X. Wu, X. Zhang, J. Liu, J. Bian, and D. Dou. Interpretable deep learning: Interpretation, interpretability, trustworthiness, and beyond. Knowledge and Information Systems, 64(12):3197–3234, 2022.
[83]
Q. Liu, H. Su, Z. Duanmu, W. Liu, and Z. Wang. Perceptual quality assessment of colored 3d point clouds. IEEE Transactions on Visualization and Computer Graphics, 29(8):3642–3655, 2022.
[84]
Q. Liu, H. Yuan, R. Hamzaoui, H. Su, J. Hou, and H. Yang. Reduced reference perceptual quality model with application to rate control for video-based point cloud compression. IEEE Transactions on Image Processing, 2021.
[85]
Y. Liu, B. Han, F. Qian, A. Narayanan, and Z.-L. Zhang. Vues: practical mobile volumetric video streaming through multiview transcoding. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking, pages 514–527, 2022.
[86]
Y. Liu, Q. Yang, Y. Xu, and L. Yang. Point cloud quality assessment: Dataset construction and learning-based no-reference metric. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 2022.
[87]
Y. Liu, Q. Yang, Y. Xu, and L. Yang. Point cloud quality assessment: Dataset construction and learning-based no-reference metric. ACM Transactions on Multimedia Computing, Communications and Applications, 19(2s):1–26, 2023.
[88]
Z. Liu, Q. Li, X. Chen, C. Wu, S. Ishihara, J. Li, and Y. Ji. Point cloud video streaming: Challenges and solutions. IEEE Network, 35(5):202–209, 2021.
[89]
Z. Liu, Y. Wang, X. Qi, and C.-W. Fu. Towards implicit text-guided 3d shape generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17896–17906, 2022.
[90]
H. Lu and H. Shi. Deep learning for 3d point cloud understanding: a survey. arXiv preprint arXiv:2009.08920, 2020.
[91]
MagicLeap. Magic Leap 1. [Online] Available: https://www.magicleap.com/en-us/magic-leap-1. Accessed on Jan. 12, 2023.
[92]
MagicLeap. Magic Leap 1. [Online] Available: https://developer.magicleap.com/en-us/learn/guides/design-why-magic-leap. Accessed on Jan. 29, 2023.
[93]
A. Maglo, G. Lavoué, F. Dupont, and C. Hudelot. 3d mesh compression: Survey, comparisons, and emerging trends. ACM Computing Surveys (CSUR), 47(3):1–41, 2015.
[94]
F. Marinello, A. Pezzuolo, D. Cillis, L. Sartori, et al. Kinect 3d reconstruction for quantification of grape bunches volume and mass. Engineering for Rural Development, 15:876–881, 2016.
[95]
A. Martin, J. Egaña, J. Flórez, J. Montalban, I. G. Olaizola, M. Quartulli, R. Viola, and M. Zorrilla. Network resource allocation system for qoe-aware delivery of media services in 5g networks. IEEE Transactions on Broadcasting, 64(2):561–574, 2018.
[96]
R. Marx, J. Herbots, W. Lamotte, and P. Quax. Same standards, different decisions: A study of quic and http/3 implementation diversity. In Proceedings of the Workshop on the Evolution, Performance, and Interoperability of QUIC, pages 14–20, 2020.
[97]
M. Masood, M. Nawaz, K. M. Malik, A. Javed, A. Irtaza, and H. Malik. Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward. Applied intelligence, 53(4):3974–4026, 2023.
[98]
M. McGill, D. Boland, R. Murray-Smith, and S. Brewster. A dose of reality: Overcoming usability challenges in vr head-mounted displays. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pages 2143–2152, 2015.
[99]
R. Mekuria, K. Blom, and P. Cesar. Design, implementation, and evaluation of a point cloud codec for tele-immersive video. IEEE Transactions on Circuits and Systems for Video Technology, 27(4):828–842, 2016.
[100]
Meta. Facebook360 Depth Estimation Pipeline. [Online] Available: https://github.com/facebook/facebook360_dep, note = Accessed on Dec. 15, 2022.
[101]
G. Meynet, Y. Nehmé, J. Digne, and G. Lavoué. Pcqm: A full-reference quality metric for colored 3d point clouds. In 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), pages 1–6. IEEE, 2020.
[102]
Microsoft Azure. Azure Kinect DK. [Online] Available: https://azure.microsoft.com/en-us/services/kinect-dk/. Accessed on Dec. 22, 2023.
[103]
B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
[104]
Y. Mirsky and W. Lee. The creation and detection of deepfakes: A survey. ACM Computing Surveys (CSUR), 54(1):1–41, 2021.
[105]
MIV Team. MPEG Immersive video (MIV). [Online] Available: https://mpeg-miv.org/index.php/reference-software/, note = Accessed on Sep. 02, 2024.
[106]
B. Montazeri, Y. Li, M. Alizadeh, and J. Ousterhout. Homa: A receiver-driven low-latency transport protocol using network priorities. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pages 221–235, 2018.
[107]
MPEG. MPEG-I: Video-based dynamic mesh coding. [Online] Available: https://www.mpeg.org/standards/MPEG-I/29/, note = Accessed on Sep. 02, 2024.
[108]
MPEG. MPEG Point Cloud Compression. [Online] Available: https://mpeg-pcc.org/, note = Accessed on Dec. 15, 2022.
[109]
MPEG Group. mpeg-pcc-tmc13. [Online] Available: https://github.com/MPEGGroup/mpeg-pcc-tmc13, note = Accessed on Sep. 02, 2024.
[110]
MPEG Group. mpeg-pcc-tmc2. [Online] Available: https://github.com/MPEGGroup/mpeg-pcc-tmc2, note = Accessed on Sep. 02, 2024.
[111]
MPEG-I. MPEG-I: Immersive Media Metrics. [Online] Available: https://www.mpeg.org/standards/MPEG-I/6/, note = Accessed on Sep. 02, 2024.
[112]
K. G. Nalbant and Ş. UYANIK. Computer vision in the metaverse. Journal of Metaverse, 2021.
[113]
M. Nguyen, S. Vats, S. Van Damme, J. Van der Hooft, M. T. Vega, T. Wauters, F. De Turck, C. Timmerer, and H. Hellwagner. Characterization of the quality of experience and immersion of point cloud video sequences through a subjective study. Ieee Access, 2023.
[114]
I. Niskanen, M. Immonen, L. Hallman, G. Yamamuchi, M. Mikkonen, T. Hashimoto, Y. Nitta, P. Keränen, J. Kostamovaara, and R. Heikkilä. Time-of-flight sensor for getting shape model of automobiles toward digital 3d imaging approach of autonomous driving. Automation in Construction, 121:103429, 2021.
[115]
nus-vv-streams. VVTk: A Toolkit for Volumetric Video Researchers. [Online] Available: https://github.com/nus-vv-streams/vvtk, note = Accessed on Sep. 02, 2024.
[116]
E. Özbay and A. Çinar. A voxelize structured refinement method for registration of point clouds from kinect sensors. Engineering Science and Technology, an International Journal, 22(2):555–568, 2019.
[117]
R. Pandey, A. Tkach, S. Yang, P. Pidlypenskyi, J. Taylor, R. Martin-Brualla, A. Tagliasacchi, G. Papandreou, P. Davidson, C. Keskin, et al. Volumetric capture of humans with a single rgbd camera via semi-parametric learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9709–9718, 2019.
[118]
B. Parr, M. Legg, and F. Alam. Analysis of depth cameras for proximal sensing of grapes. Sensors, 22(11):4179, 2022.
[119]
T. I. Partners. Volumetric video market size ($9,685.7mn by 2028) growth forecast at 26.2% cagr during 2021 to 2028 covid impact and global analysis by theinsightpartners.com. [Online] Available: https://www.globenewswire.com/en/news-release/2021/10/12/2312361/0/en/Volumetric-Video-Market-Size-9-685-7Mn-by-2028-Growth-Forecast-at-26-2-CAGR-During-2021-to-2028-COVID-Impact-and-Global-Analysis-by-TheInsightPartners-com.html, 2021. Online; accessed on Dec. 22, 2022.
[120]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
[121]
Point Cloud Library (PCL). Point Cloud Streaming to Mobile Devices with Real-time Visualization. [Online] Available: https://github.com/PointCloudLibrary/pcl/blob/master/doc/tutorials/content/mobile_streaming.rst, note = Accessed on Sep. 02, 2024.
[122]
B. Poole, A. Jain, J. T. Barron, and B. Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
[123]
Potree. potree. [Online] Available: https://github.com/potree/potree, note = Accessed on Sep. 02, 2024.
[124]
A. P. Pozo, M. Toksvig, T. F. Schrager, J. Hsu, U. Mathur, A. Sorkine-Hornung, R. Szeliski, and B. Cabral. An integrated 6dof video camera and system design. ACM Transactions on Graphics (TOG), 38(6):1–16, 2019.
[125]
C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017.
[126]
F. Qian, B. Han, J. Pair, and V. Gopalakrishnan. Toward practical volumetric video streaming on commodity smartphones. In Proceedings of the 20th International Workshop on Mobile Computing Systems and Applications, pages 135–140, 2019.
[127]
R. Walch et al. HTTP Live Streaming (HLS). [Online] Available: https://github.com/video-dev/hls.js/, 2021. Online; accessed on Jan. 04, 2022.
[128]
Radu Bogdan Rusu and Steve Cousins. NFL and Verizon Team Up on 5G Development. [Online] Available: https://github.com/PointCloudLibrary/pcl. Accessed on Jan. 22, 2023.
[129]
Radu Bogdan Rusu and Steve Cousins. PCL Repository at GitHub. [Online] Available: https://github.com/PointCloudLibrary/pcl. Accessed on Jan. 22, 2023.
[130]
Radu Bogdan Rusu and Steve Cousins. Point Cloud Library. [Online] Available: https://pointclouds.org/. Accessed on Jan. 22, 2023.
[131]
E. Ramadan, A. Narayanan, U. K. Dayalan, R. A. Fezeu, F. Qian, and Z.-L. Zhang. Case for 5g-aware video streaming applications. In Proceedings of the 1st Workshop on 5G Measurements, Modeling, and Use Cases, pages 27–34, 2021.
[132]
R. Rassool. Vmaf reproducibility: Validating a perceptual practical video quality metric. In 2017 IEEE international symposium on broadband multimedia systems and broadcasting (BMSB), pages 1–2. IEEE, 2017.
[133]
H. K. Ravuri, M. Torres Vega, J. van der Hooft, T. Wauters, and F. De Turck. Partially reliable transport layer for quicker interactive immersive media delivery. In Proceedings of the 1st Workshop on Interactive eXtended Reality, pages 41–49, 2022.
[134]
H. K. Ravuri, M. T. Vega, J. Van Der Hooft, T. Wauters, and F. De Turck. Adaptive partially reliable delivery of immersive media over quic-http/3. IEEE Access, 2023.
[135]
R. B. Ribera, T. Kim, J. Kim, and N. Hur. Dense depth map acquisition system for 3dtv applications based on active stereo and structured light integration. In Advances in Multimedia Information Processing-PCM 2009: 10th Pacific Rim Conference on Multimedia, Bangkok, Thailand, December 15-18, 2009 Proceedings 10, pages 499–510. Springer, 2009.
[136]
C. Richardt, J. Tompkin, and G. Wetzstein. Capture, reconstruction, and representation of the visual real world for virtual reality. In Real VR–Immersive Digital Reality, pages 3–32. Springer, 2020.
[137]
Roadtovr. Google's ‘Welcome to Light Fields’ VR App Reveals the Power of Volumetric Capture. [Online] Available: https://www.roadtovr.com/googles-welcome-to-lightfields-vr-app-reveals-the-power-of-volumetric-capture/. Accessed on Dec. 22, 2023.
[138]
B. Roessle, J. T. Barron, B. Mildenhall, P. P. Srinivasan, and M. Nießner. Dense depth priors for neural radiance fields from sparse input views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12892–12901, 2022.
[139]
R. B. Rusu and S. Cousins. 3D is here: Point Cloud Library (PCL). In IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, May 9-13 2011.
[140]
R. Schnabel and R. Klein. Octree-based point-cloud compression. PBG@ SIGGRAPH, 2006.
[141]
O. Schreer, I. Feldmann, P. Kauff, P. Eisert, D. Tatzelt, C. Hellge, K. Müller, S. Bliedung, and T. Ebner. Lessons learned during one year of commercial volumetric video production. SMPTE Motion Imaging Journal, 129(9):31–37, 2020.
[142]
O. Schreer, I. Feldmann, S. Renault, M. Zepp, M. Worchel, P. Eisert, and P. Kauff. Capture and 3d video processing of volumetric video. In 2019 IEEE International Conference on Image Processing (ICIP), pages 4310–4314. IEEE, 2019.
[143]
S. Schwarz, M. Preda, V. Baroncini, M. Budagavi, P. Cesar, P. A. Chou, R. A. Cohen, M. Krivokuća, S. Lasserre, Z. Li, et al. Emerging mpeg standards for point cloud compression. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 9(1):133–148, 2018.
[144]
K. Seshadrinathan, R. Soundararajan, A. C. Bovik, and L. K. Cormack. Study of subjective and objective quality assessment of video. IEEE transactions on Image Processing, 19(6):1427–1441, 2010.
[145]
H. R. Sheikh and A. C. Bovik. A visual information fidelity approach to video quality assessment. In The first international workshop on video processing and quality metrics for consumer electronics, volume 7, pages 2117–2128. sn, 2005.
[146]
M. A. Shirazi, D. Khan, M. Affan, H. A. Poonja, M. S. A. Shah, and R. Uddin. Active stereo vision based 3d reconstruction for image guided surgery. In 2021 International Conference on Robotics and Automation in Industry (ICRAI), pages 1–5. IEEE, 2021.
[147]
J. Stankowski and A. Dziembowski. Iv-psnr: Software for immersive video objective quality evaluation. SoftwareX, 24:101592, 2023.
[148]
Steampowered. Welcome to Light Fields by Google. [Online] Available: https://store.steampowered.com/app/771310/Welcome_to_Light_Fields. Accessed on Jan. 5, 2023.
[149]
V. Sterzentsenko, A. Karakottas, A. Papachristou, N. Zioulis, A. Doumanoglou, D. Zarpalas, and P. Daras. A low-cost, flexible and portable volumetric capturing system. In 2018 14th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), pages 200–207. IEEE, 2018.
[150]
S. Subramanyam, J. Li, I. Viola, and P. Cesar. Comparing the Quality of Highly Realistic Digital Humans in 3DoF and 6DoF: A Volumetric Video Case Study. Proceedings - 2020 IEEE Conference on Virtual Reality and 3D User Interfaces, VR 2020, (March):127–136, 2020.
[151]
S. Subramanyam, I. Viola, J. Jansen, E. Alexiou, A. Hanjalic, and P. Cesar. Subjective qoe evaluation of user-centered adaptive streaming of dynamic point clouds. In 2022 14th International Conference on Quality of Multimedia Experience (QoMEX), pages 1–6. IEEE, 2022.
[152]
H. Sun and Y. He. The application and user acceptance of aigc in network audiovisual field: Based on the perspective of social cognitive theory. In Proceedings of the 7th International Conference on Computer Science and Application Engineering, pages 1–5, 2023.
[153]
H. Suresh and J. V. Guttag. A framework for understanding unintended consequences of machine learning. arXiv preprint arXiv:1901.10002, 2(8), 2019.
[154]
T. Taleb, P. A. Frangoudis, I. Benkacem, and A. Ksentini. Cdn slicing over a multi-domain edge cloud. IEEE Transactions on Mobile Computing, 19(9):2010–2027, 2019.
[155]
G. Tech, Y. Chen, K. Müller, J.-R. Ohm, A. Vetro, and Y.-K. Wang. Overview of the multiview and 3d extensions of high efficiency video coding. IEEE Transactions on Circuits and Systems for Video Technology, 26(1):35–49, 2015.
[156]
D. Thanou, P. A. Chou, and P. Frossard. Graph-based compression of dynamic 3d point cloud sequences. IEEE Transactions on Image Processing, 25(4):1765–1778, 2016.
[157]
Three.js. Three.js. [Online] Available: https://threejs.org, note = Accessed on Sep. 02, 2024.
[158]
D. Tian, H. Ochimizu, C. Feng, R. Cohen, and A. Vetro. Geometric distortion metrics for point cloud compression. In IEEE International Conference on Image Processing (ICIP), 2017.
[159]
H. Tian, T. Zhu, W. Liu, and W. Zhou. Image fairness in deep learning: problems, models, and challenges. Neural Computing and Applications, 34(15):12875–12893, 2022.
[160]
X. Tian, Y.-L. Yang, and Q. Wu. Shapescaffolder: Structure-aware 3d shape generation from text. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2715–2724, 2023.
[161]
Tolga Birdal. Multiple Point Cloud Renderer using Mitsuba 2. [Online] Available: https://github.com/tolgabirdal/Mitsuba2PointCloudRenderer, note = Accessed on Sep. 02, 2024.
[162]
E. M. Torlig, E. Alexiou, T. A. Fonseca, R. L. de Queiroz, and T. Ebrahimi. A novel methodology for quality assessment of voxelized point clouds. In Applications of Digital Image Processing XLI. SPIE, 2018.
[163]
F. Tsalakanidou, F. Forster, S. Malassiotis, and M. G. Strintzis. Real-time acquisition of depth and color images using structured light and its application to 3d face recognition. Real-Time Imaging, 11(5-6):358–369, 2005.
[164]
D. Tsui, K. Ramos, C. Melentyev, A. Rajan, M. Tam, M. Jo, F. Ahadian, and F. E. Talke. A low-cost, open-source-based optical surgical navigation system using stereoscopic vision. Microsystem Technologies, pages 1–9, 2024.
[165]
R. Tu, G. Jiang, M. Yu, T. Luo, Z. Peng, and F. Chen. V-pcc projection based blind point cloud quality assessment for compression distortion. IEEE Transactions on Emerging Topics in Computational Intelligence, 2022.
[166]
Unity. Unity Engine. [Online] Available: https://unity.com/solutions/film-animation-cinematics. Accessed on Jan. 29, 2023.
[167]
Unreal. Unreal Engine. [Online] Available: https://www.unrealengine.com/en-US/. Accessed on Jan. 29, 2023.
[168]
S. Van Damme, M. T. Vega, J. van der Hooft, and F. De Turck. Clustering-based psychometric no-reference quality model for point cloud video. In 2022 IEEE International Conference on Image Processing (ICIP), 2022.
[169]
J. van der Hooft, M. Torres Vega, T. Wauters, H. K. Ravuri, C. Timmerer, H. Hellwagner, and F. De Turck. Towards 6dof virtual reality video streaming: Status and challenges. IEEE COMSOC MMTC COMMUNICATIONS-FRONTIERS, 14(5):30–37, 2019.
[170]
J. van der Hooft, M. T. Vega, C. Timmerer, A. C. Begen, F. De Turck, and R. Schatz. Objective and subjective qoe evaluation for adaptive point cloud streaming. In 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), pages 1–6. IEEE, 2020.
[171]
J. van der Hooft, M. T. Vega, T. Wauters, C. Timmerer, A. C. Begen, F. De Turck, and R. Schatz. From capturing to rendering: Volumetric media delivery with six degrees of freedom. IEEE Communications Magazine, 58(10):49–55, 2020.
[172]
J. van der Hooft, T. Wauters, F. De Turck, C. Timmerer, and H. Hellwagner. Towards 6dof http adaptive streaming through point cloud compression. In Proceedings of the 27th ACM International Conference on Multimedia, pages 2405–2413, 2019.
[173]
VCL3D. A Portable, Flexible and Facile Volumetric Capture System. [Online] Available: https://github.com/VCL3D/VolumetricCapture/releases, note = Accessed on Dec. 15, 2022.
[174]
VCL3D. Data Acquisition by Volumetric Capture. [Online] Available: https://vcl3d.github.io/VolumetricCapture/docs/acquisition/, note = Accessed on Dec. 15, 2022.
[175]
VCL3D. Volumetric Capture. [Online] Available: https://vcl3d.github.io/VolumetricCapture/, note = Accessed on Dec. 15, 2022.
[176]
I. Viola and P. Cesar. A reduced reference metric for visual quality evaluation of point cloud contents. IEEE Signal Processing Letters, 2020.
[177]
I. Viola and P. Cesar. Volumetric video streaming: Current approaches and implementations. Immersive Video Technologies, pages 425–443, 2023.
[178]
I. Viola, S. Subramanyam, and P. Cesar. A color-based objective quality metric for point cloud contents. In IEEE International Conference on Quality of Multimedia Experience (QoMEX), 2020.
[179]
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
[180]
Xtionprolive. Asus Xtion Pro Live. [Online] Available: http://xtionprolive.com/asus-xtion-pro-live, note = Accessed on Dec. 05, 2022.
[181]
Q. Yang, H. Chen, Z. Ma, Y. Xu, R. Tang, and J. Sun. Predicting the perceptual quality of point cloud: A 3d-to-2d projection-based exploration. IEEE Transactions on Multimedia, 23:3877–3891, 2020.
[182]
Q. Yang, Y. Liu, S. Chen, Y. Xu, and J. Sun. No-reference point cloud quality assessment via domain adaptation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
[183]
Q. Yang, Z. Ma, Y. Xu, Z. Li, and J. Sun. Inferring point cloud quality via graph similarity. IEEE transactions on pattern analysis and machine intelligence, 44(6):3015–3029, 2020.
[184]
A. Yaqoob, T. Bi, and G.-M. Muntean. A survey on adaptive 360 video streaming: Solutions, challenges and opportunities. IEEE Communications Surveys & Tutorials, 2020.
[185]
B. Yi, X. Wang, K. Li, M. Huang, et al. A comprehensive survey of network function virtualization. Computer Networks, 133:212–262, 2018.
[186]
D. You, T. V. Doan, R. Torre, M. Mehrabi, A. Kropp, V. Nguyen, H. Salah, G. T. Nguyen, and F. H. Fitzek. Fog computing as an enabler for immersive media: Service scenarios and research opportunities. IEEE Access, 7:65797–65810, 2019.
[187]
A. Yu, V. Ye, M. Tancik, and A. Kanazawa. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4578–4587, 2021.
[188]
E. Zerman, P. Gao, C. Ozcinar, and A. Smolic. Subjective and objective quality assessment for volumetric video compression. Electronic Imaging, 2019.
[189]
E. Zerman, C. Ozcinar, P. Gao, and A. Smolic. Textured mesh vs coloured point cloud: A subjective study for volumetric video compression. In 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), pages 1–6. IEEE, 2020.
[190]
A. Zhang, C. Wang, B. Han, and F. Qian. Efficient volumetric video streaming through super resolution. In Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications, pages 106–111, 2021.
[191]
A. Zhang, C. Wang, B. Han, and F. Qian. {YuZu}:{Neural-Enhanced} volumetric video streaming. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pages 137–154, 2022.
[192]
A. Zhang, C. Wang, X. Liu, B. Han, and F. Qian. Mobile volumetric video streaming enhanced by super resolution. In Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services, pages 462–463, 2020.
[193]
Q.-s. Zhang and S.-C. Zhu. Visual interpretability for deep learning: a survey. Frontiers of Information Technology & Electronic Engineering, 19(1):27–39, 2018.
[194]
Z. Zhang, W. Sun, X. Min, T. Wang, W. Lu, and G. Zhai. No-reference quality assessment for 3d colored point cloud and mesh models. IEEE Transactions on Circuits and Systems for Video Technology, 2022.
[195]
P. Zhou, J. Zhu, Y. Wang, Y. Lu, Z. Wei, H. Shi, Y. Ding, Y. Gao, Q. Huang, Y. Shi, et al. Vetaverse: Technologies, applications, and visions toward the intersection of metaverse, vehicles, and transportation systems. arXiv preprint arXiv:2210.15109, 2022.
[196]
W. Zhu, Z. Ma, S. Member, Y. Xu, and L. Li. View-Dependent Dynamic Point Cloud Compression. IEEE Transactions on Circuits and Systems for Video Technology, 31(2):765–781, 2021.

Index Terms

  1. Solutions, Challenges and Opportunities in Volumetric Video Streaming: An Architectural Perspective

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications Just Accepted
      EISSN:1551-6865
      Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Online AM: 21 November 2024
      Accepted: 13 November 2024
      Revised: 17 September 2024
      Received: 26 April 2024

      Check for updates

      Author Tags

      1. Immersive media
      2. volumetric videos
      3. 6DoF
      4. fully 360-degree videos
      5. adaptive video streaming
      6. high bandwidth
      7. low latency
      8. QoE
      9. AR
      10. VR
      11. MR
      12. XR

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 179
        Total Downloads
      • Downloads (Last 12 months)179
      • Downloads (Last 6 weeks)179
      Reflects downloads up to 11 Dec 2024

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media