Abstract
Knowing the shape of objects is essential to many robotics tasks. However, this is not always feasible. Recent approaches based on point clouds and voxel cubes have been proposed for shape completion from a single-depth view. However, they tend to be computationally expensive and require the tuning of many weights. This paper presents a novel architecture for shape completion based on six orthogonal views obtained from a point cloud (they can be seen as the six faces of a dice). Our network uses one branch for each orthogonal view as input–output and mixes them in the middle of the architecture. By using orthogonal views, the number of required parameters is significantly reduced. We also introduce a novel method to filter the output of networks based on orthogonal views and describe algorithms to convert an orthogonal view to voxel cube and point cloud. We compared our approach against state-of-the-art approaches on the YCB and ShapeNet datasets using the Chamfer distance and mean square error measures and showed very competitive performance with less than 5% of their parameters.
Similar content being viewed by others
Data availability
All data generated or analyzed during this study are included in this published at https://github.com/hideldt/SCBOV.
Notes
PointNet/PointNet++ [8, 9] were proposed to work directly over the point clouds. Several works have been proposed using their frameworks to solve problems like semantic segmentation, classification, point completion and others [3,4,5,6,7, 17]; in this work, we compared our results against [3] which is based on PointNet.
Note that parts of the shape can be touched on the face of the voxel cube. Our representation takes the distance to the first occupied voxel (as shown in Algorithm 2), and if it coincides with position 0 (this means that it is touching a face of the voxel cube), all these voxels have the same value of the background; for this reason, we add the offset. The offset was set empirically at four.
They use as base network PCN which has 6.8 M parameters, so compared with them we only use 4.4% of their weights.
Note that on a lower capacity GPU the comparison could not be made since both networks require at least 8GB of VRAM to be trained.
This means that the inner points are not taken into account. Only the border points.
References
Varley J, DeChant C, Richardson A, Ruales J, Allen P (2017) Shape completion enabled robotic grasping. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 2442–2447. https://doi.org/10.1109/iros.2017.8206060
Yang B, Rosa S, Markham A, Trigoni N, Wen H (2019) Dense 3D object reconstruction from a single depth view. IEEE Trans Pattern Anal Mach Intell 41(12):2820–2834. https://doi.org/10.1109/TPAMI.2018.2868195
Yuan W, Khot T, Held D, Mertz C, Hebert M (2018) Pcn: point completion network. In: 2018 international conference on 3D vision (3DV), pp 728–737
Liu M, Sheng L, Yang S, Shao J, Hu S-M (2019) Morphing and sampling network for dense point cloud completion. In: The thirty-fourth AAAI conference on artificial intelligence
Peng Y, Chang M, Wang Q, Qian Y, Zhang Y, Wei M, Liao X (2020) Sparse-to-dense multi-encoder shape completion of unstructured point cloud. IEEE Access 8:30969–30978
Yu X, Rao Y, Wang Z, Liu Z, Lu J, Zhou J (2021) Pointr: diverse point cloud completion with geometry-aware transformers. In: 2021 IEEE/CVF international conference on computer vision (ICCV), pp 12478–12487. https://doi.org/10.1109/ICCV48922.2021.01227
Xiang P, Wen X, Liu Y-S, Cao Y-P, Wan P, Zheng W, Han Z (2021) SnowflakeNet: point cloud completion by snowflake point deconvolution with skip-transformer. In: Proceedings of the IEEE international conference on computer vision (ICCV)
Charles RQ, Su H, Kaichun M, Guibas LJ (2017) Pointnet: deep learning on point sets for 3D classification and segmentation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 77–85. https://doi.org/10.1109/CVPR.2017.16
Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol 30. Curran Associates Inc, USA
Hu T, Han Z, Shrivastava A, Zwicker M (2019) Render4completion: synthesizing multi-view depth maps for 3D shape completion. In: 2019 IEEE/CVF international conference on computer vision workshop (ICCVW), pp 4114–4122. https://doi.org/10.1109/ICCVW.2019.00506
Hu T, Han Z, Zwicker M (2020) 3D shape completion with multi-view consistent inference. In: The Thirty-Fourth AAAI conference on artificial intelligence, AAAI 2020, The Thirty-Second innovative applications of artificial intelligence conference, IAAI 2020, The tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, 7–12 Feb 2020, pp 10997–11004
Chang AX, Funkhouser T, Guibas L, Hanrahan P, Huang Q, Li Z, Savarese S, Savva M, Song S, Su H, Xiao J, Yi L, Yu F (2015) ShapeNet: an information-rich 3D model repository. Technical Report arXiv:1512.03012 [cs.GR], Toyota Technological Institute, Chicago
Calli B, Singh A, Walsman A, Srinivasa S, Abbeel P, Dollar AM (2015) The ycb object and model set: towards common benchmarks for manipulation research. In: 2015 international conference on advanced robotics (ICAR), pp 510–517. https://doi.org/10.1109/ICAR.2015.7251504
Kappler D, Bohg J, Schaal S (2015) Leveraging big data for grasp planning. In: 2015 IEEE international conference on robotics and automation (ICRA), pp 4304–4311. https://doi.org/10.1109/ICRA.2015.7139793
Koenig N, Howard A (2004) Design and use paradigms for gazebo, an open-source multi-robot simulator. In: 2004 IEEE/RSJ international conference on intelligent robots and systems (IROS) (IEEE Cat. No.04CH37566), vol 3, pp 2149–21543 . https://doi.org/10.1109/IROS.2004.1389727
Min P (2019) Binvox. http://www.patrickmin.com/binvox or https://www.google.com/search?q=binvox. Accessed on 05 Oct 2019
Saha M, Amin SB, Sharma A, Kumar TKS, Kalia RK (2022) AI-driven quantification of ground glass opacities in lungs of COVID-19 patients using 3D computed tomography imaging. PLoS ONE 17:1–14. https://doi.org/10.1371/journal.pone.0263916
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the Thirty-First AAAI conference on artificial intelligence. AAAI’17, pp 4278–4284
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF (eds) Medical image computing and computer-assisted intervention—MICCAI 2015. Springer, Cham, pp 234–241
Riegler G, Ulusoy AO, Geiger A (2017) Octnet: learning deep 3D representations at high resolutions. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6620–6629. https://doi.org/10.1109/CVPR.2017.701
Choy CB, Xu D, Gwak J, Chen K, Savarese S (2016) 3D–r2n2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016. Springer, Cham, pp 628–644
Rusu RB, Cousins S (2011) 3D is here: point cloud library (PCL). In: IEEE international conference on robotics and automation (ICRA). Shanghai, China, pp 1–4
Han X, Li Z, Huang H, Kalogerakis E, Yu Y (2017) High-resolution shape completion using deep neural networks for global structure and local geometry inference. In: 2017 IEEE international conference on computer vision (ICCV), pp 85–93. https://doi.org/10.1109/ICCV.2017.19
Oliphant T (2006) A guide to NumPy. Trelgol Publishing, USA
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. In: International conference on learning representations. arxiv:1412.6980
Chollet F (2021) Deep Learning with Python, Second Edition. ISBN 9781617296864
Abadi M, Agarwal A et al (2015) TensorFlow: large-scale machine learning on heterogeneous systems. Software available from https://www.tensorflow.org/
Chollet F (2017) Deep learning with Python, 1st edn. Manning Publications Co., Greenwich
Do T-T, Nguyen A, Reid I (2018) AffordanceNet: an end-to-end deep learning approach for object affordance detection. In: 2018 IEEE international conference on robotics and automation (ICRA), pp 5882–5889. https://doi.org/10.1109/icra.2018.8460902
Acknowledgements
L. Delgado acknowledges CONACYT for the scholarship granted toward pursuing his PhD studies with Grant Number 707984.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare they have no financial interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Delgado, L., Morales, E.F. Shape completion using orthogonal views through a multi-input–output network. Pattern Anal Applic 26, 1045–1057 (2023). https://doi.org/10.1007/s10044-023-01154-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-023-01154-y