More Web Proxy on the site http://driver.im/

research-article

Reconstructing 3D Shapes as an Union of Boxes from Multi-View Images

Authors:

Minglun GongAuthors Info & Claims

PRIS '23: Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems

Pages 6 - 12

https://doi.org/10.1145/3609703.3609705

Published: 16 August 2023 Publication History

Abstract

The task of reconstructing object shapes from input images has become increasingly important in various fields, such as computer vision, robotics, augmented reality, video games, and autonomous vehicles. While approaches for reconstructing shapes with varying levels of detail have been proposed, balancing representation accuracy and model complexity remains a challenge. To address this challenge, we propose an end-to-end approach for reconstructing object shapes from multiple images using a union of box primitives. Our approach offers a simpler and more efficient 3D representation of objects without the need for intermediate products such as voxels, resulting in faster inference times. Additionally, we introduce an auxiliary task to aid in learning how to extract and transform spatial features from images without requiring camera calibrations. Extensive experiments demonstrate that our method can produce comparable results to approaches that require 3D voxelized input while utilizing only 2D RGB images as input. Furthermore, our method significantly outperforms the aforementioned approaches in terms of inference time.

References

[1]

Garrick Brazil and Xiaoming Liu. 2019. M3d-rpn: Monocular 3d region proposal network for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9287–9296.

[2]

Yingjie Cai, Buyu Li, Zeyu Jiao, Hongsheng Li, Xingyu Zeng, and Xiaogang Wang. 2020. Monocular 3d object detection with decoupled structured polygon estimation and height-guided depth estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 10478–10485.

[3]

Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu. 2015. ShapeNet: An Information-Rich 3D Model Repository. Technical Report arXiv:1512.03012 [cs.GR]. Stanford University — Princeton University — Toyota Technological Institute at Chicago.

[4]

Christopher B. Choy, Danfei Xu, JunYoung Gwak, Kevin Chen, and Silvio Savarese. 2016. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction. https://doi.org/10.48550/ARXIV.1604.00449

[5]

Paul E. Debevec, Camillo J. Taylor, and Jitendra Malik. 1996. Modeling and Rendering Architecture from Photographs: A Hybrid Geometry- and Image-Based Approach. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques(SIGGRAPH ’96). Association for Computing Machinery, New York, NY, USA, 11–20. https://doi.org/10.1145/237170.237191

Digital Library

[6]

Haoqiang Fan, Hao Su, and Leonidas Guibas. 2016. A Point Set Generation Network for 3D Object Reconstruction from a Single Image. https://doi.org/10.48550/ARXIV.1612.00603

[7]

Rohit Girdhar, David F. Fouhey, Mikel Rodriguez, and Abhinav Gupta. 2016. Learning a Predictable and Generative Vector Representation for Objects. https://doi.org/10.48550/ARXIV.1603.08637

[8]

Richard Hartley and Andrew Zisserman. 2004. Multiple View Geometry in Computer Vision (2 ed.). Cambridge University Press. https://doi.org/10.1017/CBO9780511811685

[9]

Sunghoon Im, Hae-Gon Jeon, Stephen Lin, and In So Kweon. 2019. DPSNet: End-to-end Deep Plane Sweep Stereo. https://doi.org/10.48550/ARXIV.1905.00538

[10]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, 448–456.

[11]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[12]

Fiona Lang and W. F rstner. 1996. 3D-city modeling with a digital one-eye stereo system.

[13]

A. Laurentini. 1994. The visual hull concept for silhouette-based image understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence 16, 2 (1994), 150–162. https://doi.org/10.1109/34.273735

Digital Library

[14]

Yinhao Li, Han Bao, Zheng Ge, Jinrong Yang, Jianjian Sun, and Zeming Li. 2022. Bevstereo: Enhancing depth estimation in multi-view 3d object detection with dynamic temporal stereo. arXiv preprint arXiv:2209.10248 (2022).

[15]

Shujie Luo, Hang Dai, Ling Shao, and Yong Ding. 2021. M3dssd: Monocular 3d single stage object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6145–6154.

[16]

Priyanka Mandikal, K L Navaneet, Mayank Agarwal, and R. Venkatesh Babu. 2018. 3D-LMNet: Latent Embedding Matching for Accurate and Diverse 3D Point Cloud Reconstruction from a Single Image. https://doi.org/10.48550/ARXIV.1807.07796

[17]

Priyanka Mandikal, K L Navaneet, and R Venkatesh Babu. 2018. 3D-PSRNet: Part Segmented 3D Point Cloud Reconstruction From a Single Image. In 3D Reconstruction Meets Semantics Workshop (ECCVW).

[18]

Zak Murez, Tarrence van As, James Bartolozzi, Ayan Sinha, Vijay Badrinarayanan, and Andrew Rabinovich. 2020. Atlas: End-to-End 3D Scene Reconstruction from Posed Images. https://doi.org/10.48550/ARXIV.2003.10432

[19]

Marc Pollefeys. 2000. 3D Modelling from Images. In European Conference on Computer Vision.

[20]

Sebastian Ruder. 2017. An Overview of Multi-Task Learning in Deep Neural Networks. https://doi.org/10.48550/ARXIV.1706.05098

[21]

Xingyuan Sun, Jiajun Wu, Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Tianfan Xue, Joshua B Tenenbaum, and William T Freeman. 2018. Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2014. Going Deeper with Convolutions. https://doi.org/10.48550/ARXIV.1409.4842

[23]

Keisuke Tateno, Federico Tombari, Iro Laina, and Nassir Navab. 2017. CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction. https://doi.org/10.48550/ARXIV.1704.03489

[24]

S. Tulsiani, H. Su, L. J. Guibas, A. A. Efros, and J. Malik. 2017. Learning Shape Abstractions by Assembling Volumetric Primitives. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1466–1474.

[25]

Jiajun Wu, Chengkai Zhang, Tianfan Xue, William T. Freeman, and Joshua B. Tenenbaum. 2016. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling. https://doi.org/10.48550/ARXIV.1610.07584

[26]

Qiangeng Xu, Weiyue Wang, Duygu Ceylan, Radomir Mech, and Ulrich Neumann. 2019. DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction. In NeurIPS.

[27]

Bin Yang, Wenjie Luo, and Raquel Urtasun. 2018. PIXOR: Real-Time 3D Object Detection From Point Clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]

Yao Yao, Zixin Luo, Shiwei Li, Tian Fang, and Long Quan. 2018. MVSNet: Depth Inference for Unstructured Multi-view Stereo. https://doi.org/10.48550/ARXIV.1804.02505

[29]

Yin Zhou and Oncel Tuzel. 2018. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Index Terms

Reconstructing 3D Shapes as an Union of Boxes from Multi-View Images
1. Computing methodologies
  1. Computer graphics
    1. Shape modeling
      1. Shape analysis

Recommendations

The Sinogram Polygonizer for Reconstructing 3D Shapes

This paper proposes a novel approach, the sinogram polygonizer, for directly reconstructing 3D shapes from sinograms (i.e., the primary output from X-ray computed tomography (CT) scanners consisting of projection image sequences of an object shown from ...
Exploring rich intermediate representations for reconstructing 3D shapes from 2D images
Highlights
- Embed the rich intermediate representations to the 3D reconstruction network to mine the information about shape priors and visible surface geometry of the ...
Abstract
Recovering 3D voxelized shapes with fine details from single-view 2D images is an extremely challenging and ill-conditioned problem. Most of the existing methods learn the 3D reconstruction process by encoding the 3D shapes and the 2D ...
Disentangling Deep Network for Reconstructing 3D Object Shapes from Single 2D Images
Pattern Recognition and Computer Vision
Abstract
Recovering 3D shapes of deformable objects from single 2D images is an extremely challenging and ill-posed problem. Most existing approaches are based on structure-from-motion or graph inference, where a 3D shape is solved by fitting 2D keypoints/...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

PRIS '23: Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems

July 2023

123 pages

ISBN:9781450399968

DOI:10.1145/3609703

Editors:
Wenbing Zhao,
Xinguo Yu

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 August 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

PRIS 2023

PRIS 2023: 2023 5th International Conference on Pattern Recognition and Intelligent Systems

July 28 - 30, 2023

Shenyang, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
31
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten