Temporal-MPI: Enabling Multi-plane Images for Dynamic Scene Modelling via Temporal Basis Learning

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13675))

Included in the following conference series:

European Conference on Computer Vision

2560 Accesses
6 Citations

Abstract

Novel view synthesis of static scenes has achieved remarkable advancements in producing photo-realistic results. However, key challenges remain for immersive rendering of dynamic scenes. One of the seminal image-based rendering method, the multi-plane image (MPI), produces high novel-view synthesis quality for static scenes. But modelling dynamic contents by MPI is not studied. In this paper, we propose a novel Temporal-MPI representation which is able to encode the rich 3D and dynamic variation information throughout the entire video as compact temporal basis and coefficients jointly learned. Time-instance MPI for rendering can be generated efficiently using mini-seconds by linear combinations of temporal basis and coefficients from Temporal-MPI. Thus novel-views at arbitrary time-instance will be able to be rendered via Temporal-MPI in real-time with high visual quality. Our method is trained and evaluated on Nvidia Dynamic Scene Dataset. We show that our proposed Temporal-MPI is much faster and more compact compared with other state-of-the-art dynamic scene modelling methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 79.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 99.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Fast View Synthesis of Casual Videos with Soup-of-Planes

Neural Radiance Fields for Dynamic View Synthesis Using Local Temporal Priors

DynMF: Neural Motion Factorization for Real-Time Dynamic View Synthesis with 3D Gaussian Splatting

References

Agudo, A., Moreno-Noguer, F.: Simultaneous pose and non-rigid shape with particle dynamics. In: CVPR, pp. 2179–2187 (2015)
Google Scholar
Bartoli, A., Gérard, Y., Chadebecq, F., Collins, T., Pizarro, D.: Shape-from-template. IEEE Trans. Pattern Anal. Mach. Intell. 37(10), 2099–2118 (2015)
Article Google Scholar
Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems. Numer. Math. 76(2), 167–188 (1997)
Article MathSciNet MATH Google Scholar
Chen, A., et al.: MVSNeRF: fast generalizable radiance field reconstruction from multi-view stereo. In: CVPR, pp. 14124–14133 (2021)
Google Scholar
Hu, R., Ravi, N., Berg, A.C., Pathak, D.: Worldsheet: wrapping the world in a 3D sheet for view synthesis from a single image. In: ICCV, pp. 12528–12537 (2021)
Google Scholar
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. In: CVPR, pp. 2462–2470 (2017)
Google Scholar
Jeon, H.G., et al.: Accurate depth map estimation from a lenslet light field camera. In: CVPR, pp. 1547–1555 (2015)
Google Scholar
Jin, J., Hou, J., Chen, J., Zeng, H., Kwong, S., Yu, J.: Deep coarse-to-fine dense light field reconstruction with flexible sampling and geometry-aware fusion. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 1 (2020)
Google Scholar
Kalantari, N.K., Wang, T.C., Ramamoorthi, R.: Learning-based view synthesis for light field cameras. ACM Trans. Graph. 35(6), 1–10 (2016)
Article Google Scholar
Li, T., et al.: Neural 3D video synthesis from multi-view video. In: CVPR, pp. 5521–5531 (2022)
Google Scholar
Li, Z., Niklaus, S., Snavely, N., Wang, O.: Neural scene flow fields for space-time view synthesis of dynamic scenes. In: CVPR, pp. 6498–6508 (2021)
Google Scholar
Lin, K.E., Xiao, L., Liu, F., Yang, G., Ramamoorthi, R.: Deep 3D mask volume for view synthesis of dynamic scenes. In: ICCV, pp. 1749–1758 (2021)
Google Scholar
Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., Ma, Y.: Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 171–184 (2012)
Article Google Scholar
Liu, L., Gu, J., Zaw Lin, K., Chua, T.S., Theobalt, C.: Neural sparse voxel fields. Adv. Neural. Inf. Process. Syst. 33, 15651–15663 (2020)
Google Scholar
Luo, X., Huang, J.B., Szeliski, R., Matzen, K., Kopf, J.: Consistent video depth estimation. ACM Trans. Graph. 39(4), 71:1–71:13 (2020)
Google Scholar
McMillan, L., Bishop, G.: Plenoptic modeling: an image-based rendering system. In: Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques, pp. 39–46 (1995)
Google Scholar
Mildenhall, B., et al.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph. 38(4), 1–14 (2019)
Article Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NERF: representing scenes as neural radiance fields for view synthesis. In: ECCV, pp. 405–421 (2020)
Google Scholar
Moreno-Noguer, F., Fua, P.: Stochastic exploration of ambiguities for nonrigid shape recovery. IEEE Trans. Pattern Anal. Mach. Intell. 35(2), 463–475 (2012)
Article Google Scholar
Navarro, J., Buades, A.: Robust and dense depth estimation for light field images. IEEE Trans. Image Process. 26(4), 1873–1886 (2017)
Article MathSciNet MATH Google Scholar
Ng, R., Levoy, M., Brédif, M., Duval, G., Horowitz, M., Hanrahan, P.: Light field photography with a hand-held plenoptic camera. Ph.D. thesis, Stanford University (2005)
Google Scholar
Niklaus, S., Mai, L., Yang, J., Liu, F.: 3D Ken Burns effect from a single image. ACM Trans. Graph. 38(6), 1–15 (2019)
Article Google Scholar
Porter, T., Duff, T.: Compositing digital images. SIGGRAPH Comput. Graph. 18(3), 253–259 (1984)
Article Google Scholar
Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-NERF: neural radiance fields for dynamic scenes. In: CVPR, pp. 10318–10327 (2021)
Google Scholar
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., Koltun, V.: Towards robust monocular depth estimation: mixing datasets for zero-shot cross-dataset transfer. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 1 (2020)
Google Scholar
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR, pp. 4104–4113 (2016)
Google Scholar
Shade, J., Gortler, S., He, L.W., Szeliski, R.: Layered depth images. In: ACM SIGGRAPH, pp. 231–242 (1998)
Google Scholar
Shih, M.L., Su, S.Y., Kopf, J., Huang, J.B.: 3D photography using context-aware layered depth inpainting. In: CVPR, pp. 8025–8035 (2020)
Google Scholar
Sitzmann, V., Thies, J., Heide, F., Nießner, M., Wetzstein, G., Zollhofer, M.: DeepVoxels: learning persistent 3D feature embeddings. In: ICCV, pp. 2437–2446 (2019)
Google Scholar
Srinivasan, P.P., Wang, T., Sreelal, A., Ramamoorthi, R., Ng, R.: Learning to synthesize a 4D RGBD light field from a single image. In: ICCV, pp. 2243–2251 (2017)
Google Scholar
Tang, C., Yuan, L., Tan, P.: LSM: learning subspace minimization for low-level vision. In: CVPR, pp. 6235–6246 (2020)
Google Scholar
Tomasi, C., Kanade, T.: Shape and motion from image streams under orthography: a factorization method. Int. J. Comput. Vision 9(2), 137–154 (1992)
Article Google Scholar
Tucker, R., Snavely, N.: Single-view view synthesis with multiplane images. In: CVPR, pp. 551–560 (2020)
Google Scholar
Wiles, O., Gkioxari, G., Szeliski, R., Johnson, J.: SynSin: end-to-end view synthesis from a single image. In: CVPR, pp. 7465–7475 (2020)
Google Scholar
Wizadwongsa, S., Phongthawee, P., Yenphraphai, J., Suwajanakorn, S.: NeX: real-time view synthesis with neural basis expansion. In: CVPR, pp. 8534–8543 (2021)
Google Scholar
Wulff, J., Black, M.J.: Efficient sparse-to-dense optical flow estimation using a learned basis and layers. In: CVPR, pp. 120–130 (2015)
Google Scholar
Yoon, J.S., Kim, K., Gallo, O., Park, H.S., Kautz, J.: Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera. In: CVPR, pp. 5336–5345 (2020)
Google Scholar
Yu, A., Li, R., Tancik, M., Li, H., Ng, R., Kanazawa, A.: PlenOctrees for real-time rendering of neural radiance fields. In: ICCV, pp. 5752–5761 (2021)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR, pp. 586–595 (2018)
Google Scholar
Zhou, T., Tucker, R., Flynn, J., Fyffe, G., Snavely, N.: Stereo magnification: learning view synthesis using multiplane images. ACM Trans. Graph. 37(4), 1–12 (2018)
Article Google Scholar

Download references

Acknowledgments

The research was supported by the Theme-based Research Scheme, Research Grants Council of Hong Kong (T45-205/21-N).

Author information

Authors and Affiliations

Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
Wenpeng Xing & Jie Chen

Authors

Wenpeng Xing
View author publications
You can also search for this author in PubMed Google Scholar
Jie Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Chen .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4590 KB)

Supplementary material 2 (pdf 4590 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xing, W., Chen, J. (2022). Temporal-MPI: Enabling Multi-plane Images for Dynamic Scene Modelling via Temporal Basis Learning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13675. Springer, Cham. https://doi.org/10.1007/978-3-031-19784-0_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-19784-0_19
Published: 31 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19783-3
Online ISBN: 978-3-031-19784-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Temporal-MPI: Enabling Multi-plane Images for Dynamic Scene Modelling via Temporal Basis Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Fast View Synthesis of Casual Videos with Soup-of-Planes

Neural Radiance Fields for Dynamic View Synthesis Using Local Temporal Priors

DynMF: Neural Motion Factorization for Real-Time Dynamic View Synthesis with 3D Gaussian Splatting

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 4590 KB)

Supplementary material 2 (pdf 4590 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Temporal-MPI: Enabling Multi-plane Images for Dynamic Scene Modelling via Temporal Basis Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Fast View Synthesis of Casual Videos with Soup-of-Planes

Neural Radiance Fields for Dynamic View Synthesis Using Local Temporal Priors

DynMF: Neural Motion Factorization for Real-Time Dynamic View Synthesis with 3D Gaussian Splatting

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 4590 KB)

Supplementary material 2 (pdf 4590 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation