Abstract
Distributed parallel rendering provides a valuable way to navigate large-scale scenes. However, previous works typically focused on outputting ultra-high-resolution images. In this paper, we target on improving the interactivity of navigation and propose a large-scale scene navigation method, GuideRender, based on multi-modal view frustum movement prediction. Given previous frames, user inputs and object information, GuideRender first extracts frames, user inputs and objects features spatially and temporally using the multi-modal extractor. To obtain effective fused features for prediction, we introduce an attentional guidance fusion module to fuse these features of different domains with attention. Finally, we predict the movement of the view frustum based on the attentional fused features and obtain its future state for loading data in advance to reduce latency. In addition, to facilitate GuideRender, we design an object hierarchy hybrid tree for scene management based on the object distribution and hierarchy, and an adaptive virtual sub-frustum decomposition method based on the relationship between the rendering cost and the rendering node capacity for task decomposition. Experimental results show that GuideRender outperforms baselines in navigating large-scale scenes. We also conduct a user study to show that our method satisfies the navigation requirements in large-scale scenes.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets that support the current study are available from the corresponding author on reasonable request.
References
Armeni, I., Sax, S., Zamir, A.R., Savarese, S.: Joint 2d-3d-semantic data for indoor scene understanding. arXiv preprint arXiv:1702.01105 (2017)
Borji, A., Sihite, D.N., Itti, L.: Probabilistic learning of task-specific visual attention. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 470–477 (2012)
Chim, J.H.P., Green, M., Lau, R.W.H., Va Leong, H., Si, A.: On caching and prefetching of virtual objects in distributed virtual environments. In: Proceedings of the Sixth ACM International Conference on Multimedia, MULTIMEDIA ’98, pp. 171–180. Association for Computing Machinery, New York, NY, USA (1998). https://doi.org/10.1145/290747.290769
Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: Predicting human eye fixations via an lstm-based saliency attentive model. IEEE Trans. Image Process. 27(10), 5142–5154 (2018)
Crockett, T.W.: An introduction to parallel rendering. Parallel Comput. 23(7), 819–843 (1997)
Eilemann, S., Makhinya, M., Pajarola, R.: Equalizer: a scalable parallel rendering framework. IEEE Trans. Visual Comput. Graphics 15, 436–452 (2009)
Eilemann, S., Steiner, D., Pajarola, R.: Equalizer 2.0-convergence of a parallel rendering framework. IEEE Trans. Visual Comput. Graphics 26, 1292–1307 (2020)
Han, M., Wald, I., Usher, W., Morrical, N., Knoll, A., Pascucci, V., Johnson, C.R.: A virtual frame buffer abstraction for parallel rendering of large tiled display walls. In: 2020 IEEE Visualization Conference (VIS), pp. 11–15. IEEE (2020)
Hartmann, D., Van der Auweraer, H.: Digital twins. In: Progress in Industrial Mathematics: Success Stories, pp. 3–17. Springer (2021)
Hu, Z., Bulling, A., Li, S., Wang, G.: Fixationnet: forecasting eye fixations in task-oriented virtual environments. IEEE Trans. Visual Comput. Graphics 27, 2681–2690 (2021)
Hu, Z., Li, S., Zhang, C., Yi, K., Wang, G., Manocha, D.: Dgaze: Cnn-based gaze prediction in dynamic scenes. IEEE Trans. Visual Comput. Graphics 26, 1902–1911 (2020)
Humphreys, G., Houston, M., Ng, R., Frank, R., Ahern, S., Kirchner, P., Klosowski, J.T.: Chromium: a stream-processing framework for interactive rendering on clusters. In: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques (2002)
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 11, 1254–1259 (1998)
Johnson, G., Abram, G., Westing, B.M., Navrátil, P.A., Gaither, K.: Displaycluster: An interactive visualization environment for tiled displays. In: 2012 IEEE International Conference on Cluster Computing, pp. 239–247 (2012)
Karras, T., Aila, T.: Fast parallel construction of high-quality bounding volume hierarchies. In: Proceedings of the 5th High-Performance Graphics Conference, pp. 89–99 (2013)
Koulieris, G., Drettakis, G., Cunningham, D., Mania, K.: Gaze prediction using machine learning for dynamic stereo manipulation in games. In: 2016 IEEE Virtual Reality (VR), pp. 113–120 (2016)
Kroner, A., Senden, M., Driessens, K., Goebel, R.: Contextual encoder-decoder network for visual saliency prediction. Neural Netw. Off. J. Int. Neural Netw. Soc. 129, 261–270 (2020)
Kummerer, M., Wallis, T.S., Gatys, L.A., Bethge, M.: Understanding low-and high-level contributions to fixation prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4789–4798 (2017)
Lai, D.Q.: A distributed memory hierarchy and data management for interactive scene navigation and modification on tiled display walls. IEEE Trans. Visual Comput. Graphics 21, 714–729 (2015)
Lee, D., Choi, M., Lee, J.: Prediction of head movement in 360-degree videos using attention model. Sensors 21(11), 3678 (2021)
Marrinan, T., Rizzi, S., Insley, J., Long, L., Renambot, L., Papka, M.: Pxstream: Remote visualization for distributed rendering frameworks. In: 2019 IEEE 9th Symposium on Large Data Analysis and Visualization (LDAV), pp. 37–41 (2019)
Meagher, D.: Octree encoding: A new technique for the representation, manipulation and display of arbitrary 3-d objects by computer (1980)
Moloney, B., Ament, M., Weiskopf, D., Möller, T.: Sort-first parallel volume rendering. IEEE Trans. Visual Comput. Graphics 17, 1164–1177 (2011)
Ning, H., Wang, H., Lin, Y., Wang, W., Dhelim, S., Farha, F., Ding, J., Daneshmand, M.: A survey on metaverse: the state-of-the-art, technologies, applications, and challenges. arXiv preprint arXiv:2111.09673 (2021)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Pribyl, J., Zemcik, P.: Multi-resolution next location prediction for distributed virtual environments. In: 2010 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, pp. 247–254. IEEE (2010)
Ren, X., Lis, M.: Chopin: Scalable graphics rendering in multi-gpu systems via parallel image composition. In: 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA), pp. 709–722. IEEE (2021)
Repplinger, M., Löffler, A., Rubinstein, D., Slusallek, P.: Drone: A flexible framework for distributed rendering and display. In: International Symposium on Visual Computing, pp. 975–986. Springer (2009)
Theis, L., Korshunova, I., Tejani, A., Huszár, F.: Faster gaze prediction with dense networks and fisher pruning. arXiv preprint arXiv:1801.05787 (2018)
Tsai, Y.H.H., Bai, S., Liang, P.P., Kolter, J.Z., Morency, L.P., Salakhutdinov, R.: Multimodal transformer for unaligned multimodal language sequences. In: Proceedings of the conference. Association for Computational Linguistics. Meeting, vol. 2019, pp. 6558. NIH Public Access (2019)
Wald, I., Johnson, G., Amstutz, J., Brownlee, C., Knoll, A., Jeffers, J., Günther, J., Navrátil, P.A.: Ospray—a cpu ray tracing framework for scientific visualization. IEEE Trans. Visual Comput. Graphics 23, 931–940 (2017)
Wu, C., Zhang, R., Wang, Z., Sun, L.: A spherical convolution approach for learning long term viewport prediction in 360 immersive video. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 14003–14040 (2020)
Xu, M., Song, Y., Wang, J., Qiao, M., Huo, L., Wang, Z.: Predicting head movement in panoramic video: A deep reinforcement learning approach. IEEE Trans. Pattern Anal. Mach. Intell. 41(11), 2693–2708 (2018)
Xu, Y., Dong, Y., Wu, J., Sun, Z., Shi, Z., Yu, J., Gao, S.: Gaze prediction in dynamic 360 immersive videos. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5333–5342 (2018)
Yao, J., Pan, Z., Zhang, H.: A distributed render farm system for animation production. In: International Conference on Entertainment Computing, pp. 264–269. Springer (2009)
Funding
This work was supported by the National Key Research and Development Program of China under grant number 2022YFC2407000, the Interdisciplinary Program of Shanghai Jiao Tong University under grant number YG2023LC11 and YG2022ZD007, National Natural Science Foundation of China under grant number 62272298 and 62077037, the College-level Project Fund of Shanghai Jiao Tong University Affiliated Sixth People’s Hospital under grant number ynlc201909, the Medical-industrial Cross-fund of Shanghai Jiao Tong University under grant number YG2022QN089.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary file 1 (mp4 58376 KB)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qin, Y., Chi, X., Sheng, B. et al. GuideRender: large-scale scene navigation based on multi-modal view frustum movement prediction. Vis Comput 39, 3597–3607 (2023). https://doi.org/10.1007/s00371-023-02922-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-02922-x