[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3474085.3475244acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Space-Angle Super-Resolution for Multi-View Images

Published: 17 October 2021 Publication History

Abstract

The limited spatial and angular resolutions in multi-view multimedia applications restrict their visual experience in practical use. In this paper, we first argue the space-angle super-resolution (SASR) problem for irregular arranged multi-view images. It aims to increase the spatial resolution of source views and synthesize arbitrary virtual high resolution (HR) views between them jointly. One feasible solution is to perform super-resolution (SR) and view synthesis (VS) methods separately. However, it cannot fully exploit the intra-relationship between SR and VS tasks. Intuitively, multi-view images can provide more angular references, and higher resolution can provide more high-frequency details. Therefore, we propose a one-stage space-angle super-resolution network called SASRnet, which simultaneously synthesizes real and virtual HR views. Extensive experiments on several benchmarks demonstrate that our proposed method outperforms two-stage methods, meanwhile prove that SR and VS can promote each other. To our knowledge, this work is the first to address the SASR problem for unstructured multi-view images in an end-to-end learning-based manner.

References

[1]
Tom E Bishop, Sara Zanetti, and Paolo Favaro. 2009. Light field superresolution. In 2009 IEEE International Conference on Computational Photography (ICCP). IEEE, 1--9.
[2]
Rodrigo Ortiz Cayon, Abdelaziz Djelouah, and George Drettakis. 2015. A bayesian approach for selective image-based rendering using superpixels. In International Conference on 3D Vision. IEEE, 469--477.
[3]
Jin-Xiang Chai, Xin Tong, Shing-Chow Chan, and Heung-Yeung Shum. 2000. Plenoptic sampling. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques. 307--318.
[4]
Gaurav Chaurasia, Sylvain Duchene, Olga Sorkine-Hornung, and George Drettakis. 2013. Depth synthesis and local warps for plausible image-based navigation. ACM Transactions on Graphics (TOG), Vol. 32, 3 (2013), 1--12.
[5]
Shenchang Eric Chen and Lance Williams. 1993. View interpolation for image synthesis. In Proceedings of the 20th annual conference on Computer graphics and interactive techniques. 279--288.
[6]
Inchang Choi, Orazio Gallo, Alejandro Troccoli, Min H Kim, and Jan Kautz. 2019. Extreme view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7781--7790.
[7]
Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015. High-quality streamable free-viewpoint video. ACM Transactions on Graphics (ToG), Vol. 34, 4 (2015), 1--13.
[8]
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
[9]
Paul E Debevec, Camillo J Taylor, and Jitendra Malik. 1996. Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. 11--20.
[10]
Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2014. Learning a Deep Convolutional Network for Image Super-Resolution. In Computer Vision -- ECCV. Springer International Publishing, 184--199.
[11]
John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. 2016. Deepstereo: Learning to predict new views from the world's imagery. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5515--5524.
[12]
Henry Fuchs, Andrei State, and Jean-Charles Bazin. 2014. Immersive 3d telepresence. Computer, Vol. 47, 7 (2014), 46--52.
[13]
Diogo C Garcia, Camilo Dorea, and Ricardo L de Queiroz. 2012. Super resolution for multiview images using depth information. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 22, 9 (2012), 1249--1256.
[14]
Clément Godard, Oisin Mac Aodha, and Gabriel J Brostow. 2017. Unsupervised monocular depth estimation with left-right consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 270--279.
[15]
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In NIPS. 2672--2680.
[16]
Steven J Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F Cohen. 1996. The lumigraph. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. 43--54.
[17]
Muhammad Haris, Gregory Shakhnarovich, and Norimichi Ukita. 2019. Recurrent Back-Projection Network for Video Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19]
Peter Hedman, Julien Philip, True Price, J-Michael Frahm, George Drettakis, and Gabriel Brostow. 2018. Deep blending for free-viewpoint image-based rendering. ACM Transactions on Graphics (TOG), Vol. 37, 6 (2018), 1--15.
[20]
Benno Heigl, Reinhard Koch, Marc Pollefeys, Joachim Denzler, and Luc Van Gool. 1999. Plenoptic modeling and rendering from image sequences taken by a hand-held camera. In Mustererkennung. Springer, 94--101.
[21]
Nima Khademi Kalantari, Ting-Chun Wang, and Ravi Ramamoorthi. 2016. Learning-based view synthesis for light field cameras. ACM Transactions on Graphics (TOG), Vol. 35, 6 (2016), 1--10.
[22]
A. Kappeler, S. Yoo, Q. Dai, and A. K. Katsaggelos. 2016a. Super-resolution of compressed videos using convolutional neural networks. In 2016 IEEE International Conference on Image Processing (ICIP). 1150--1154.
[23]
A. Kappeler, S. Yoo, Q. Dai, and A. K. Katsaggelos. 2016b. Video Super-Resolution With Convolutional Neural Networks. IEEE Transactions on Computational Imaging, Vol. 2, 2 (2016), 109--122.
[24]
Petr Kellnhofer, Lars Jebe, Andrew Jones, Ryan Spicer, Kari Pulli, and Gordon Wetzstein. 2021. Neural Lumigraph Rendering. arxiv: 2103.11571 [cs.CV]
[25]
Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. 2017. Tanks and Temples: Benchmarking Large-Scale Scene Reconstruction. ACM Transactions on Graphics, Vol. 36, 4 (2017).
[26]
Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27]
Chuen-Chien Lee, Ali Tabatabai, and Kenji Tashiro. 2015. Free viewpoint video (FVV) survey and future research direction. APSIPA Transactions on Signal and Information Processing, Vol. 4 (2015).
[28]
Marc Levoy and Pat Hanrahan. 1996. Light field rendering. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques. 31--42.
[29]
Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. 2017. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 136--144.
[30]
Cheng Ma, Yongming Rao, Yean Cheng, Ce Chen, Jiwen Lu, and Jie Zhou. 2020. Structure-Preserving Super Resolution With Gradient Guidance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31]
Leonard McMillan. 1997. An image-based approach to three-dimensional computer graphics. Ph.D. Dissertation. Citeseer.
[32]
Yiqun Mei, Yuchen Fan, Yuqian Zhou, Lichao Huang, Thomas S. Huang, and Honghui Shi. 2020. Image Super-Resolution With Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[33]
Ben Mildenhall, Pratul P Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar. 2019. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG), Vol. 38, 4 (2019), 1--14.
[34]
Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2020. Nerf: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision. Springer, 405--421.
[35]
Kaushik Mitra and Ashok Veeraraghavan. 2012. Light field denoising, light field superresolution and stereo camera based refocussing using a GMM light field patch prior. In 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 22--28.
[36]
Yannick Morvan. 2009. Acquisition, compression and rendering of depth and texture for multi-view video. (2009).
[37]
Merlin Nimier-David, Delio Vicini, Tizian Zeltner, and Wenzel Jakob. 2019. Mitsuba 2: A retargetable forward and inverse renderer. ACM Transactions on Graphics (TOG), Vol. 38, 6 (2019), 1--17.
[38]
Eric Penner and Li Zhang. 2017. Soft 3D reconstruction for view synthesis. ACM Transactions on Graphics (TOG), Vol. 36, 6 (2017), 1--11.
[39]
Gernot Riegler and Vladlen Koltun. 2020. Free View Synthesis. In European Conference on Computer Vision.
[40]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.
[41]
Thomas Schöps, Johannes L. Schönberger, Silvano Galliani, Torsten Sattler, Konrad Schindler, Marc Pollefeys, and Andreas Geiger. 2017. A Multi-View Stereo Benchmark with High-Resolution Images and Multi-Camera Videos. In Conference on Computer Vision and Pattern Recognition (CVPR).
[42]
Wenzhe Shi, Jose Caballero, Ferenc Huszar, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. 2016. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43]
Meng-Li Shih, Shih-Yang Su, Johannes Kopf, and Jia-Bin Huang. 2020. 3d photography using context-aware layered depth inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8028--8038.
[44]
Gyumin Shim, Jinsun Park, and In So Kweon. 2020. Robust Reference-Based Super-Resolution With Similarity-Aware Deformable Convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[45]
Harry Shum and Sing Bing Kang. 2000. Review of image-based rendering techniques. In Visual Communications and Image Processing, Vol. 4067. International Society for Optics and Photonics, 2--13.
[46]
Vincent Sitzmann, Michael Zollhöfer, and Gordon Wetzstein. 2019. Scene representation networks: Continuous 3d-structure-aware neural scene representations. arXiv preprint arXiv:1906.01618 (2019).
[47]
Xin Tao, Hongyun Gao, Renjie Liao, Jue Wang, and Jiaya Jia. 2017. Detail-Revealing Deep Video Super-Resolution. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
[48]
Yapeng Tian, Yulun Zhang, Yun Fu, and Chenliang Xu. 2020. TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[49]
Xintao Wang, Kelvin C.K. Chan, Ke Yu, Chao Dong, and Chen Change Loy. 2019. EDVR: Video Restoration With Enhanced Deformable Convolutional Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
[50]
Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. 2018. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops.
[51]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, Vol. 13, 4 (2004), 600--612.
[52]
Sven Wanner and Bastian Goldluecke. 2012. Spatial and angular variational super-resolution of 4D light fields. In European Conference on Computer Vision. Springer, 608--621.
[53]
Sven Wanner and Bastian Goldluecke. 2013. Variational light field analysis for disparity estimation and super-resolution. IEEE transactions on pattern analysis and machine intelligence, Vol. 36, 3 (2013), 606--619.
[54]
Olivia Wiles, Georgia Gkioxari, Richard Szeliski, and Justin Johnson. 2020. Synsin: End-to-end view synthesis from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7467--7477.
[55]
Suttisak Wizadwongsa, Pakkapon Phongthawee, Jiraphon Yenphraphai, and Supasorn Suwajanakorn. 2021. NeX: Real-time View Synthesis with Neural Basis Expansion. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[56]
Bo Yan, Chuming Lin, and Weimin Tan. 2019. Frame and Feature-Context Video Super-Resolution. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 01 (Jul. 2019), 5597--5604.
[57]
Fuzhi Yang, Huan Yang, Jianlong Fu, Hongtao Lu, and Baining Guo. 2020. Learning Texture Transformer Network for Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[58]
Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, and Long Quan. 2018. MVSNet: Depth Inference for Unstructured Multi-view Stereo. European Conference on Computer Vision (ECCV) (2018).
[59]
Zhichao Yin and Jianping Shi. 2018. Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1983--1992.
[60]
Cha Zhang and Tsuhan Chen. 2004. A survey on image-based rendering-representation, sampling and compression. Signal Processing: Image Communication, Vol. 19, 1 (2004), 1--28.
[61]
Jing Zhang, Yang Cao, Zhigang Zheng, Changwen Chen, and Zengfu Wang. 2014. A new closed loop method of super-resolution for multi-view images. Machine Vision and Applications, Vol. 25, 7 (2014), 1685--1695.
[62]
Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. 2018. Image Super-Resolution Using Very Deep Residual Channel Attention Networks. In Proceedings of the European Conference on Computer Vision (ECCV).
[63]
Shangchen Zhou, Jiawei Zhang, Wangmeng Zuo, and Chen Change Loy. 2020. Cross-Scale Internal Graph Neural Network for Image Super-Resolution. In Annual Conference on Neural Information Processing Systems, (NeurIPS).
[64]
Tinghui Zhou, Richard Tucker, John Flynn, Graham Fyffe, and Noah Snavely. 2018. Stereo Magnification: Learning View Synthesis using Multiplane Images. In SIGGRAPH.
[65]
Xizhou Zhu, Han Hu, Stephen Lin, and Jifeng Dai. 2019. Deformable ConvNets V2: More Deformable, Better Results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[66]
C Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and Richard Szeliski. 2004. High-quality video view interpolation using a layered representation. ACM transactions on graphics (TOG), Vol. 23, 3 (2004), 600--608.

Cited By

View all
  • (2024)Superpixel-based Efficient Sampling for Learning Neural Fields from Large InputProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681299(10421-10430)Online publication date: 28-Oct-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. multi-view images
  2. super-resolution
  3. view synthesis

Qualifiers

  • Research-article

Funding Sources

Conference

MM '21
Sponsor:
MM '21: ACM Multimedia Conference
October 20 - 24, 2021
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)10
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Superpixel-based Efficient Sampling for Learning Neural Fields from Large InputProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681299(10421-10430)Online publication date: 28-Oct-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media