[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3609703.3609705acmotherconferencesArticle/Chapter ViewAbstractPublication PagesprisConference Proceedingsconference-collections
research-article

Reconstructing 3D Shapes as an Union of Boxes from Multi-View Images

Published: 16 August 2023 Publication History

Abstract

The task of reconstructing object shapes from input images has become increasingly important in various fields, such as computer vision, robotics, augmented reality, video games, and autonomous vehicles. While approaches for reconstructing shapes with varying levels of detail have been proposed, balancing representation accuracy and model complexity remains a challenge. To address this challenge, we propose an end-to-end approach for reconstructing object shapes from multiple images using a union of box primitives. Our approach offers a simpler and more efficient 3D representation of objects without the need for intermediate products such as voxels, resulting in faster inference times. Additionally, we introduce an auxiliary task to aid in learning how to extract and transform spatial features from images without requiring camera calibrations. Extensive experiments demonstrate that our method can produce comparable results to approaches that require 3D voxelized input while utilizing only 2D RGB images as input. Furthermore, our method significantly outperforms the aforementioned approaches in terms of inference time.

References

[1]
Garrick Brazil and Xiaoming Liu. 2019. M3d-rpn: Monocular 3d region proposal network for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9287–9296.
[2]
Yingjie Cai, Buyu Li, Zeyu Jiao, Hongsheng Li, Xingyu Zeng, and Xiaogang Wang. 2020. Monocular 3d object detection with decoupled structured polygon estimation and height-guided depth estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 10478–10485.
[3]
Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu. 2015. ShapeNet: An Information-Rich 3D Model Repository. Technical Report arXiv:1512.03012 [cs.GR]. Stanford University — Princeton University — Toyota Technological Institute at Chicago.
[4]
Christopher B. Choy, Danfei Xu, JunYoung Gwak, Kevin Chen, and Silvio Savarese. 2016. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction. https://doi.org/10.48550/ARXIV.1604.00449
[5]
Paul E. Debevec, Camillo J. Taylor, and Jitendra Malik. 1996. Modeling and Rendering Architecture from Photographs: A Hybrid Geometry- and Image-Based Approach. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques(SIGGRAPH ’96). Association for Computing Machinery, New York, NY, USA, 11–20. https://doi.org/10.1145/237170.237191
[6]
Haoqiang Fan, Hao Su, and Leonidas Guibas. 2016. A Point Set Generation Network for 3D Object Reconstruction from a Single Image. https://doi.org/10.48550/ARXIV.1612.00603
[7]
Rohit Girdhar, David F. Fouhey, Mikel Rodriguez, and Abhinav Gupta. 2016. Learning a Predictable and Generative Vector Representation for Objects. https://doi.org/10.48550/ARXIV.1603.08637
[8]
Richard Hartley and Andrew Zisserman. 2004. Multiple View Geometry in Computer Vision (2 ed.). Cambridge University Press. https://doi.org/10.1017/CBO9780511811685
[9]
Sunghoon Im, Hae-Gon Jeon, Stephen Lin, and In So Kweon. 2019. DPSNet: End-to-end Deep Plane Sweep Stereo. https://doi.org/10.48550/ARXIV.1905.00538
[10]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, 448–456.
[11]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[12]
Fiona Lang and W. F rstner. 1996. 3D-city modeling with a digital one-eye stereo system.
[13]
A. Laurentini. 1994. The visual hull concept for silhouette-based image understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 2 (1994), 150–162. https://doi.org/10.1109/34.273735
[14]
Yinhao Li, Han Bao, Zheng Ge, Jinrong Yang, Jianjian Sun, and Zeming Li. 2022. Bevstereo: Enhancing depth estimation in multi-view 3d object detection with dynamic temporal stereo. arXiv preprint arXiv:2209.10248 (2022).
[15]
Shujie Luo, Hang Dai, Ling Shao, and Yong Ding. 2021. M3dssd: Monocular 3d single stage object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6145–6154.
[16]
Priyanka Mandikal, K L Navaneet, Mayank Agarwal, and R. Venkatesh Babu. 2018. 3D-LMNet: Latent Embedding Matching for Accurate and Diverse 3D Point Cloud Reconstruction from a Single Image. https://doi.org/10.48550/ARXIV.1807.07796
[17]
Priyanka Mandikal, K L Navaneet, and R Venkatesh Babu. 2018. 3D-PSRNet: Part Segmented 3D Point Cloud Reconstruction From a Single Image. In 3D Reconstruction Meets Semantics Workshop (ECCVW).
[18]
Zak Murez, Tarrence van As, James Bartolozzi, Ayan Sinha, Vijay Badrinarayanan, and Andrew Rabinovich. 2020. Atlas: End-to-End 3D Scene Reconstruction from Posed Images. https://doi.org/10.48550/ARXIV.2003.10432
[19]
Marc Pollefeys. 2000. 3D Modelling from Images. In European Conference on Computer Vision.
[20]
Sebastian Ruder. 2017. An Overview of Multi-Task Learning in Deep Neural Networks. https://doi.org/10.48550/ARXIV.1706.05098
[21]
Xingyuan Sun, Jiajun Wu, Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Tianfan Xue, Joshua B Tenenbaum, and William T Freeman. 2018. Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2014. Going Deeper with Convolutions. https://doi.org/10.48550/ARXIV.1409.4842
[23]
Keisuke Tateno, Federico Tombari, Iro Laina, and Nassir Navab. 2017. CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction. https://doi.org/10.48550/ARXIV.1704.03489
[24]
S. Tulsiani, H. Su, L. J. Guibas, A. A. Efros, and J. Malik. 2017. Learning Shape Abstractions by Assembling Volumetric Primitives. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1466–1474.
[25]
Jiajun Wu, Chengkai Zhang, Tianfan Xue, William T. Freeman, and Joshua B. Tenenbaum. 2016. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling. https://doi.org/10.48550/ARXIV.1610.07584
[26]
Qiangeng Xu, Weiyue Wang, Duygu Ceylan, Radomir Mech, and Ulrich Neumann. 2019. DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction. In NeurIPS.
[27]
Bin Yang, Wenjie Luo, and Raquel Urtasun. 2018. PIXOR: Real-Time 3D Object Detection From Point Clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28]
Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, and Long Quan. 2018. MVSNet: Depth Inference for Unstructured Multi-view Stereo. https://doi.org/10.48550/ARXIV.1804.02505
[29]
Yin Zhou and Oncel Tuzel. 2018. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Index Terms

  1. Reconstructing 3D Shapes as an Union of Boxes from Multi-View Images

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    PRIS '23: Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems
    July 2023
    123 pages
    ISBN:9781450399968
    DOI:10.1145/3609703
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 August 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. neural networks
    2. shape reconstruction

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    PRIS 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 31
      Total Downloads
    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media