Three-Dimensional Dense Reconstruction: A Review of Algorithms and Datasets
<p>Structure from motion.</p> "> Figure 2
<p>Shape from shading.</p> "> Figure 3
<p>Simultaneous localization and mapping for dense visual reconstruction.</p> "> Figure 4
<p>Architecture of deep multi-view stereo.</p> "> Figure 5
<p>Architecture of 3D UNet.</p> "> Figure 6
<p>Architecture of deep visual odometry.</p> "> Figure 7
<p>Architecture of PointNet.</p> "> Figure 8
<p>Architecture of 3D generative adversarial network.</p> "> Figure 9
<p>Architecture of the encoder of 3D autoencoder. The decoder has the same architecture.</p> ">
Abstract
:1. Introduction
2. Geometrical 3D Reconstruction
2.1. Overview
- Image Acquisition: The first step is capturing multiple images or video frames of the scene or object from different angles and viewpoints. The quality and resolution of these images are crucial, as they directly impact the accuracy of the final 3D model.
- Feature Detection and Matching: Distinctive features or keypoints are identified within the images, and corresponding features across different images are matched. Common algorithms for this step include SIFT, SURF, and ORB.
- Camera Pose Estimation: Once feature correspondences are established, the relative positions and orientations of the cameras are estimated. This step is essential for reconstructing the geometry of the scene and is typically performed using methods like an essential matrix, homography matrix, or bundle adjustment.
- Depth Estimation: Depth information for each pixel is calculated, often using stereo matching or multi-view stereo techniques. This step generates a dense point cloud that represents the 3D structure of the scene or object.
- Surface Reconstruction: The dense point cloud is then transformed into a 3D mesh that represents the surface of the object or scene. Algorithms such as Poisson surface reconstruction or marching cubes are commonly used in this process.
- Texturing: In the final step, color and texture information from the original images are applied to the 3D mesh, creating a photorealistic 3D model.
2.2. Structure from Motion
2.3. Shape from Shading
- Lambertian Surface: The object’s surface is assumed to follow a Lambertian reflectance model, reflecting light equally in all directions, with the reflected light intensity depending only on the angle between the light source and the surface normal [29].
- Known Lighting Conditions: The position, intensity, and color of the light source(s) are assumed to be known or estimated.
- Smooth Surface: The object’s surface is assumed to be smooth, with continuous variations in depth and surface normals.
- Single Image: SfS operates on a single image, unlike other 3D reconstruction techniques that rely on multiple images or stereo pairs.
2.4. SLAM
3. Deep-Learning-Based 3D Dense Reconstruction
3.1. Convolutional Neural Networks
3.2. Three-Dimensional Convolutional Neural Networks (3D-CNNs)
3.3. Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM)
3.4. Graph Neural Networks (GNNs)
- Input: An unordered set of 3D points, each represented by its x, y, and z coordinates.
- Transformation Networks (T-Nets): These are mini-PointNets that learn spatial transformations to align the input point cloud. There are two T-Nets: the first predicts a 3 × 3 transformation matrix to align the point cloud and the second predicts a 64 × 64 matrix to align the features.
- Multi-Layer Perceptrons (MLPs): Fully connected layers that learn local features for each input point. The architecture includes several MLP layers with varying numbers of neurons (e.g., 64, 128, or 1024), applying a shared weight function to each point independently, which ensures permutation invariance.
- Max Pooling: A symmetric function that aggregates local features into a global point cloud feature. Max pooling captures the most salient features of the input point cloud.
- Fully Connected Layers and Output (MLPs): Processes the global point cloud feature to generate the final output. For classification tasks, the output layer has as many neurons as there are object classes, while, for segmentation tasks, the output layer produces per-point scores.
3.5. Generative Adversarial Networks (GANs)
- Generator: A 3D Convolutional Neural Network (CNN) that takes a random noise vector as input and produces a 3D object as output. It uses transposed 3D convolutional layers for upsampling, followed by batch normalization and ReLU activation functions. The architecture resembles a 3D U-Net, incorporating skip connections between corresponding layers to refine the generated shapes.
- Discriminator: A 3D CNN that classifies the generated 3D object as either real (from the training dataset) or fake (produced by the generator). It consists of several 3D convolutional layers with batch normalization, leaky ReLU activation functions, and a final fully connected layer with a sigmoid activation function.
3.6. Autoencoders and Variational Autoencoders (VAEs)
- Encoder: The input to the encoder is typically a 3D representation, such as a voxel grid (3D binary or scalar grid), point cloud (a set of points in 3D space), or a mesh. For voxel-based inputs, 3D convolutional layers are used to capture spatial features from the 3D grid. These layers reduce the spatial dimensions while increasing the depth of the feature maps. In the case of point clouds, layers like PointNet or PointNet++ might be used to extract features directly from the unordered set of points. After the convolutional layers, fully connected (dense) layers are used to further compress the feature representation into a latent space. This latent space is a lower-dimensional representation of the 3D input.
- Latent Space: The latent space, also known as the bottleneck layer, contains the compressed representation of the input data. It is typically a vector of fixed size that encodes the most important features necessary for reconstructing the original 3D structure. The size of the latent space is a crucial parameter that balances compression and reconstruction accuracy.
- Decoder The decoder begins with fully connected layers that take the latent space vector as input and gradually expand it back to the dimensions of the original 3D representation. For voxel-based inputs, 3D deconvolutional (transposed convolutional) layers are used to upsample the feature maps and reconstruct the 3D structure. These layers progressively increase the spatial dimensions back to the size of the original input. The output layer produces the final 3D reconstruction, typically in the form of a voxel grid, point cloud, or mesh, depending on the original input format.
3.7. Neural Radiance Fields (NeRFs)
- High-Quality Reconstructions: NeRFs can produce highly detailed and photorealistic 3D reconstructions by accurately modeling complex lighting and appearance details. This results in high-quality visual outputs that capture fine textures and intricate scene details.
- Continuous Representation: NeRFs represent 3D scenes as continuous volumetric functions, allowing for smooth interpolation and fine details that are challenging to capture with discrete representations like voxel grids.
- View Synthesis: NeRFs excel at synthesizing novel views of a scene, making them effective for applications that require generating images from new viewpoints not included in the training data.
- Flexibility: NeRFs can handle various scene types and can be adapted to different input modalities, such as RGB images and depth maps, enhancing their versatility.
- Computational Cost: Training a NeRF model can be computationally expensive and time-consuming, requiring significant GPU resources and memory. This is due to the need for fine-grained volumetric sampling and the complex nature of the optimization process.
- Data Requirements: NeRFs require a large number of input images from diverse viewpoints to produce accurate and detailed reconstructions. Acquiring and processing these images can be challenging and resource-intensive.
- Inference Speed: While NeRFs generate high-quality reconstructions, the inference process can be slow, as it involves querying the neural network for many points in the volume during rendering.
- Limited Novel Shape Generation: NeRFs are typically trained on existing scenes and may not generalize well to generating novel shapes or objects that were not part of the training data.
3.8. Transformer
4. Dataset for Deep-Learning-Based 3D Dense Reconstruction
4.1. Dataset Review
4.2. Algorithms and Dataset
5. Discussion
- Low Image Quality: Images with low resolution, noise, or poor lighting conditions can adversely affect the performance of feature detection and matching algorithms, leading to inaccurate depth estimation and flawed reconstructions. High-quality images are crucial for robust 3D dense reconstruction [118,119,120,121,122].
- Deformation: Non-rigid or deformable objects, such as fabric or human bodies, can lead to inconsistencies in the reconstruction process. Deformations may alter an object’s appearance between views, complicating the establishment of correct feature correspondences and accurate 3D structure estimation [123,124,125].
- Adverse Illumination Conditions: Difficult lighting conditions, such as shadows, glare, or over- and under-exposure, can negatively impact feature detection and matching algorithms. Reflective or transparent surfaces may create misleading feature matches due to appearance changes depending on the viewpoint. Robust algorithms need to handle these challenging conditions to ensure accurate reconstruction [131,132,133].
Funding
Conflicts of Interest
References
- Lin, Y.; Tremblay, J.; Tyree, S.; Vela, P.A.; Birchfield, S. Multi-view fusion for multi-level robotic scene understanding. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; IEEE: New York, NY, USA, 2021; pp. 6817–6824. [Google Scholar]
- Li, Y.; Hannaford, B. Gaussian Process Regression for Sensorless Grip Force Estimation of Cable-Driven Elongated Surgical Instruments. IEEE Robot. Autom. Lett. 2017, 2, 1312–1319. [Google Scholar] [CrossRef] [PubMed]
- Tian, Y.; Chang, Y.; Arias, F.H.; Nieto-Granda, C.; How, J.P.; Carlone, L. Kimera-multi: Robust, distributed, dense metric-semantic slam for multi-robot systems. IEEE Trans. Robot. 2022, 38, 2022–2038. [Google Scholar] [CrossRef]
- Florence, P.R.; Manuelli, L.; Tedrake, R. Dense object nets: Learning dense visual object descriptors by and for robotic manipulation. arXiv 2018, arXiv:1806.08756. [Google Scholar]
- Li, Y.; Konuthula, N.; Humphreys, I.M.; Moe, K.; Hannaford, B.; Bly, R. Real-time virtual intraoperative CT in endoscopic sinus surgery. Int. J. Comput. Assist. Radiol. Surg. 2022, 17, 249–260. [Google Scholar] [CrossRef] [PubMed]
- Wei, R.; Li, B.; Mo, H.; Lu, B.; Long, Y.; Yang, B.; Dou, Q.; Liu, Y.; Sun, D. Stereo dense scene reconstruction and accurate localization for learning-based navigation of laparoscope in minimally invasive surgery. IEEE Trans. Biomed. Eng. 2022, 70, 488–500. [Google Scholar] [CrossRef]
- Mane, T.; Bayramova, A.; Daniilidis, K.; Mordohai, P.; Bernardis, E. Single-camera 3D head fitting for mixed reality clinical applications. Comput. Vis. Image Underst. 2022, 218, 103384. [Google Scholar] [CrossRef]
- Zillner, J.; Mendez, E.; Wagner, D. Augmented reality remote collaboration with dense reconstruction. In Proceedings of the 2018 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), Munich, Germany, 16–20 October 2018; IEEE: New York, NY, USA, 2018; pp. 38–39. [Google Scholar]
- Mossel, A.; Kroeter, M. Streaming and exploration of dynamically changing dense 3d reconstructions in immersive virtual reality. In Proceedings of the 2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR-Adjunct), Merida, Mexico, 19–23 September 2016; IEEE: New York, NY, USA, 2016; pp. 43–48. [Google Scholar]
- Geiger, A.; Ziegler, J.; Stiller, C. Stereoscan: Dense 3d reconstruction in real-time. In Proceedings of the 2011 IEEE Intelligent Vehicles Symposium (IV), Baden-Baden, Germany, 5–9 June 2011; IEEE: New York, NY, USA, 2011; pp. 963–968. [Google Scholar]
- Zeng, X.; Peng, X.; Qiao, Y. Df2net: A dense-fine-finer network for detailed 3d face reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2315–2324. [Google Scholar]
- Deák, G. Photogrammetry: Past, present, and future. J. Photogramm. Remote Sens. 2018, 143, 153–164. [Google Scholar]
- Luhmann, T.; Robson, S.; Kyle, S.; Harley, I. Close-Range Photogrammetry and 3D Imaging; Walter de Gruyter GmbH & Co. KG: Berlin, Germany, 2016. [Google Scholar]
- Horn, B.K. Robot Vision; MIT Press: Cambridge, MA, USA, 1986. [Google Scholar]
- Li, Y.; Olson, E.B. A general purpose feature extractor for light detection and ranging data. Sensors 2010, 10, 10356–10375. [Google Scholar] [CrossRef]
- Faugeras, O. Three-Dimensional Computer Vision: A Geometric Viewpoint; MIT Press: Cambridge, MA, USA, 1993. [Google Scholar]
- Bolles, R.C.; Baker, H.H.; Marimont, D.H. Epipolar-plane image analysis: An approach to determining structure from motion. Int. J. Comput. Vis. 1987, 1, 7–55. [Google Scholar] [CrossRef]
- Seitz, S.M.; Dyer, C.R. Photorealistic scene reconstruction by voxel coloring. Int. J. Comput. Vis. 1999, 35, 151–173. [Google Scholar] [CrossRef]
- Szeliski, R. Computer Vision: Algorithms and Applications; Springer Science & Business Media: Berlin, Germany, 2010. [Google Scholar]
- Chang, A.X.; Funkhouser, T.; Guibas, L.; Hanrahan, P.; Huang, Q.; Li, Z.; Savarese, S.; Savva, M.; Song, S.; Su, H.; et al. Shapenet: An information-rich 3d model repository. arXiv 2015, arXiv:1512.03012. [Google Scholar]
- Dai, A.; Chang, A.X.; Savva, M.; Halber, M.; Funkhouser, T.; Nießner, M. ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: New York, NY, USA, 2017. [Google Scholar]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. Speeded-up robust features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
- Mur-Artal, R.; Montiel, J.; Tardós, J.D. Orb-slam: A versatile and accurate monocular slam system. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef]
- Neira, J.; Tardos, J. Data association in stochastic mapping using the joint compatibility test. IEEE Trans. Robot. Autom. 2001, 17, 890–897. [Google Scholar] [CrossRef]
- Li, Y.; Olson, E.B. IPJC: The incremental posterior joint compatibility test for fast feature cloud matching. In Proceedings of the Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference, Vilamoura, Portugal, 7–12 October 2012; IEEE: New York, NY, USA, 2012; pp. 3467–3474. [Google Scholar]
- Li, Y.; Li, S.; Song, Q.; Liu, H.; Meng, M.Q.H. Fast and robust data association using posterior based approximate joint compatibility test. IEEE Trans. Ind. Inform. 2014, 10, 331–339. [Google Scholar] [CrossRef]
- Kazhdan, M.; Bolitho, M.; Hoppe, H. Poisson surface reconstruction. In Proceedings of the Fourth Eurographics Symposium on Geometry Processing, Cagliari, Italy, 26–28 June 2006; Volume 7. [Google Scholar]
- Horn, B.K.; Brooks, M.J. Shape from Shading; MIT Press: Cambridge, MA, USA, 1989. [Google Scholar]
- Durrant-Whyte, H.; Bailey, T. Simultaneous localization and mapping: Part I. Robot. Autom. Mag. IEEE 2006, 13, 99–110. [Google Scholar] [CrossRef]
- Li, Y.; Olson, E.B. Extracting general-purpose features from LIDAR data. In Proceedings of the Robotics and Automation (ICRA), 2010 IEEE International Conference, Anchorage, AK, USA, 3–8 May 2010; IEEE: New York, NY, USA, 2010; pp. 1388–1393. [Google Scholar]
- Li, Y. Research on Robust Mapping Methods in Unstructured Environments. Ph.D. Thesis, University of Science and Technology of China, Hefei, China, 2010. [Google Scholar]
- Li, Y.; Olson, E.B. Structure tensors for general purpose LIDAR feature extraction. In Proceedings of the Robotics and Automation (ICRA), 2011 IEEE International Conference on, Shanghai, China, 9–13 May 2011; IEEE: New York, NY, USA, 2011; pp. 1869–1874. [Google Scholar]
- Izadi, S.; Kim, D.; Hilliges, O.; Molyneaux, D.; Newcombe, R.; Kohli, P.; Shotton, J.; Hodges, S.; Freeman, D.; Davison, A.; et al. Kinectfusion: Real-time 3d reconstruction and interaction using a moving depth camera. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, Santa Barbara, CA, USA, 16–19 October 2011; pp. 559–568. [Google Scholar]
- Whelan, T.; Salas-Moreno, R.F.; Glocker, B.; Davison, A.J.; Leutenegger, S. ElasticFusion: Real-time dense SLAM and light source estimation. Int. J. Robot. Res. 2016, 35, 1697–1716. [Google Scholar] [CrossRef]
- Engel, J.; Schöps, T.; Cremers, D. LSD-SLAM: Large-scale direct monocular SLAM. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part II 13. Springer: Cham, Switzerland, 2014; pp. 834–849. [Google Scholar]
- Newcombe, R.A.; Lovegrove, S.J.; Davison, A.J. DTAM: Dense tracking and mapping in real-time. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; IEEE: New York, NY, USA, 2011; pp. 2320–2327. [Google Scholar]
- LeCun, Y.; Bengio, Y. Convolutional networks for images, speech, and time series. In The Handbook of Brain Theory and Neural Networks; MIT Press: Cambridge, MA, USA, 1995; Volume 3361. [Google Scholar]
- Qin, F.; Li, Y.; Su, Y.H.; Xu, D.; Hannaford, B. Surgical instrument segmentation for endoscopic vision with data fusion of cnn prediction and kinematic pose. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; IEEE: New York, NY, USA, 2019; pp. 9821–9827. [Google Scholar]
- Lin, S.; Qin, F.; Li, Y.; Bly, R.A.; Moe, K.S.; Hannaford, B. LC-GAN: Image-to-image Translation Based on Generative Adversarial Network for Endoscopic Images. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24–30 October 2020; pp. 2914–2920. [Google Scholar] [CrossRef]
- Qin, F.; Lin, S.; Li, Y.; Bly, R.A.; Moe, K.S.; Hannaford, B. Towards better surgical instrument segmentation in endoscopic vision: Multi-angle feature aggregation and contour supervision. IEEE Robot. Autom. Lett. 2020, 5, 6639–6646. [Google Scholar] [CrossRef]
- Huang, P.H.; Matzen, K.; Kopf, J.; Ahuja, N.; Huang, J.B. Deepmvs: Learning multi-view stereopsis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2821–2830. [Google Scholar]
- Alhashim, I.; Wonka, P. High quality monocular depth estimation via transfer learning. arXiv 2018, arXiv:1812.11941. [Google Scholar]
- Zhou, T.; Brown, M.; Snavely, N.; Lowe, D.G. Unsupervised learning of depth and ego-motion from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1851–1858. [Google Scholar]
- Çiçek, Ö.; Abdulkadir, A.; Lienkamp, S.S.; Brox, T.; Ronneberger, O. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, 17–21 October 2016; Proceedings, Part II 19. Springer: Berlin, Germany, 2016; pp. 424–432. [Google Scholar]
- Riegler, G.; Osman Ulusoy, A.; Geiger, A. Octnet: Learning deep 3d representations at high resolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3577–3586. [Google Scholar]
- Zhou, Y.; Tuzel, O. Voxelnet: End-to-end learning for point cloud based 3d object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4490–4499. [Google Scholar]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning internal representations by error propagation. In Technical Report; California Univ San Diego La Jolla Inst for Cognitive Science: La Jolla, CA, USA, 1985. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Qi, Y.; Jin, L.; Li, H.; Li, Y.; Liu, M. Discrete Computational Neural Dynamics Models for Solving Time-Dependent Sylvester Equations with Applications to Robotics and MIMO Systems. IEEE Trans. Ind. Inform. 2020, 16, 6231–6241. [Google Scholar] [CrossRef]
- Li, Y.; Li, S.; Hannaford, B. A model based recurrent neural network with randomness for efficient control with applications. IEEE Trans. Ind. Inform. 2018, 15, 2054–2063. [Google Scholar] [CrossRef] [PubMed]
- Wang, S.; Clark, R.; Wen, H.; Trigoni, N. Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; IEEE: New York, NY, USA, 2017; pp. 2043–2050. [Google Scholar]
- Jin, L.; Li, S.; Luo, X.; Li, Y.; Qin, B. Neural dynamics for cooperative control of redundant robot manipulators. IEEE Trans. Ind. Inform. 2018, 14, 3812–3821. [Google Scholar] [CrossRef]
- Li, Y.; Li, S.; Miyasaka, M.; Lewis, A.; Hannaford, B. Improving Control Precision and Motion Adaptiveness for Surgical Robot with Recurrent Neural Network. In Proceedings of the Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ International Conference, Vancouver, BC, Canada, 24–28 September 2017; IEEE: New York, NY, USA, 2017; pp. 1–6. [Google Scholar]
- Ummenhofer, B.; Zhou, H.; Uhrig, J.; Mayer, N.; Ilg, E.; Dosovitskiy, A.; Brox, T. Demon: Depth and motion network for learning monocular stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5038–5047. [Google Scholar]
- Li, R.; Wang, S.; Long, Z.; Gu, D. Undeepvo: Monocular visual odometry through unsupervised deep learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; IEEE: New York, NY, USA, 2018; pp. 7286–7291. [Google Scholar]
- Li, S.; Li, Y. Nonlinearly activated neural network for solving time-varying complex sylvester equation. IEEE Trans. Cybern. 2014, 44, 1397–1407. [Google Scholar] [CrossRef]
- Li, S.; He, J.; Li, Y.; Rafique, M.U. Distributed recurrent neural networks for cooperative control of manipulators: A game-theoretic perspective. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 415–426. [Google Scholar] [CrossRef]
- Johnson, M.J.; Duvenaud, D.K.; Wiltschko, A.; Adams, R.P.; Datta, S.R. Composing graphical models with neural networks for structured representations and fast inference. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph cnn for learning on point clouds. Acm Trans. Graph. (Tog) 2019, 38, 1–12. [Google Scholar] [CrossRef]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Yi, H.C.; You, Z.H.; Huang, D.S.; Guo, Z.H.; Chan, K.C.; Li, Y. Learning Representations to Predict Intermolecular Interactions on Large-Scale Heterogeneous Molecular Association Network. Iscience 2020, 23, 101261. [Google Scholar] [CrossRef]
- Chen, Z.H.; Li, L.P.; He, Z.; Zhou, J.R.; Li, Y.; Wong, L. An Improved Deep Forest Model for Predicting Self-Interacting Proteins From Protein Sequence Using Wavelet Transformation. Front. Genet. 2019, 10, 90. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; You, Z.H.; Chen, X.; Li, Y.; Dong, Y.N.; Li, L.P.; Zheng, K. LMTRDA: Using logistic model tree to predict MiRNA-disease associations by fusing multi-source information of sequences and similarities. PLoS Comput. Biol. 2019, 15, e1006865. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; You, Z.H.; Li, Y.; Zheng, K.; Huang, Y.A. GCNCDA: A new method for predicting circRNA-disease associations based on Graph Convolutional Network Algorithm. PLOS Comput. Biol. 2020, 16, e1007568. [Google Scholar] [CrossRef] [PubMed]
- Wu, J.; Zhang, C.; Xue, T.; Freeman, B.; Tenenbaum, J. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
- Xie, H.; Yao, H.; Sun, X.; Zhou, S.; Zhang, S. Pix2vox: Context-aware 3d reconstruction from single and multi-view images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2690–2698. [Google Scholar]
- Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1912–1920. [Google Scholar]
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
- Remondino, F.; Karami, A.; Yan, Z.; Mazzacca, G.; Rigon, S.; Qin, R. A critical analysis of nerf-based 3d reconstruction. Remote Sens. 2023, 15, 3585. [Google Scholar] [CrossRef]
- Vaswani, A. Attention is all you need. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2017. [Google Scholar]
- Scharstein, D.; Szeliski, R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 2002, 47, 7–42. [Google Scholar] [CrossRef]
- Schöps, T.; Schönberger, J.L.; Galliani, S.; Sattler, T.; Schindler, K.; Pollefeys, M.; Geiger, A. A Multi-View Stereo Benchmark with High-Resolution Images and Multi-Camera Videos. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–15 July 2017. [Google Scholar]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012. [Google Scholar]
- Xiao, J.; Owens, A.; Torralba, A. Sun3d: A database of big spaces reconstructed using sfm and object labels. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 1625–1632. [Google Scholar]
- Chang, A.; Dai, A.; Funkhouser, T.; Halber, M.; Niessner, M.; Savva, M.; Song, S.; Zeng, A.; Zhang, Y. Matterport3D: Learning from RGB-D Data in Indoor Environments. In Proceedings of the International Conference on 3D Vision (3DV), Qingdao, China, 10–12 October 2017. [Google Scholar]
- Aanæs, H.; Jensen, R.R.; Vogiatzis, G.; Tola, E.; Dahl, A.B. Large-scale data for multiple-view stereopsis. Int. J. Comput. Vis. 2016, 120, 153–168. [Google Scholar] [CrossRef]
- Yao, Y.; Luo, Z.; Li, S.; Zhang, J.; Ren, Y.; Zhou, L.; Fang, T.; Quan, L. BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Networks. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
- Knapitsch, A.; Park, J.; Zhou, Q.Y.; Koltun, V. Tanks and Temples: Benchmarking Large-Scale Scene Reconstruction. ACM Trans. Graphics 2017, 36, 78. [Google Scholar] [CrossRef]
- Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D. A Benchmark for the Evaluation of RGB-D SLAM Systems. In Proceedings of the International Conference on Intelligent Robot Systems (IROS), Vilamoura, Portugal, 7–12 October 2012. [Google Scholar]
- Silberman, N.; Hoiem, D.; Kohli, P.; Fergus, R. Indoor Segmentation and Support Inference from RGBD Images. In Proceedings of the ECCV, Florence, Italy, 7–13 October 2012. [Google Scholar]
- Hua, B.S.; Pham, Q.H.; Nguyen, D.T.; Tran, M.K.; Yu, L.F.; Yeung, S.K. Scenenn: A scene meshes dataset with annotations. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; IEEE: New York, NY, USA, 2016; pp. 92–101. [Google Scholar]
- Li, Z.; Snavely, N. MegaDepth: Learning Single-View Depth Prediction from Internet Photos. In Proceedings of the Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Wang, P.; Huang, X.; Cheng, X.; Zhou, D.; Geng, Q.; Yang, R. The apolloscape open dataset for autonomous driving and its application. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 42, 2702–2719. [Google Scholar]
- Li, Z.; Yu, T.W.; Sang, S.; Wang, S.; Song, M.; Liu, Y.; Yeh, Y.Y.; Zhu, R.; Gundavarapu, N.; Shi, J.; et al. Openrooms: An open framework for photorealistic indoor scene datasets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 7190–7199. [Google Scholar]
- Valada, A.; Oliveira, G.; Brox, T.; Burgard, W. Deep Multispectral Semantic Scene Understanding of Forested Environments using Multimodal Fusion. In Proceedings of the International Symposium on Experimental Robotics (ISER), Nagasaki, Japan, 3–8 October 2016. [Google Scholar]
- Zioulis, N.; Karakottas, A.; Zarpalas, D.; Alvarez, F.; Daras, P. Spherical View Synthesis for Self-Supervised 360o Depth Estimation. In Proceedings of the International Conference on 3D Vision (3DV), Québec City, QC, Canada, 16–19 September 2019. [Google Scholar]
- Zamir, A.R.; Sax, A.; Shen, W.B.; Guibas, L.; Malik, J.; Savarese, S. Taskonomy: Disentangling Task Transfer Learning. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; IEEE: New York, NY, USA, 2018. [Google Scholar]
- Straub, J.; Whelan, T.; Ma, L.; Chen, Y.; Wijmans, E.; Green, S.; Engel, J.J.; Mur-Artal, R.; Ren, C.; Verma, S.; et al. The Replica dataset: A digital replica of indoor spaces. arXiv 2019, arXiv:1906.05797. [Google Scholar]
- Allan, M.; Mcleod, J.; Wang, C.; Rosenthal, J.C.; Hu, Z.; Gard, N.; Eisert, P.; Fu, K.X.; Zeffiro, T.; Xia, W.; et al. Stereo correspondence and reconstruction of endoscopic data challenge. arXiv 2021, arXiv:2101.01133. [Google Scholar]
- Ozyoruk, K.B.; Gokceler, G.I.; Coskun, G.; Incetan, K.; Almalioglu, Y.; Mahmood, F.; Curto, E.; Perdigoto, L.; Oliveira, M.; Sahin, H.; et al. EndoSLAM Dataset and An Unsupervised Monocular Visual Odometry and Depth Estimation Approach for Endoscopic Videos: Endo-SfMLearner. arXiv 2020, arXiv:2006.16670. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Zhang, J.; Li, S. STMVO: Biologically inspired monocular visual odometry. Neural Comput. Appl. 2018, 29, 215–225. [Google Scholar] [CrossRef]
- Eigen, D.; Fergus, R. Depth map prediction from a single image using a multi-scale deep network. In Proceedings of the Advances in Neural Information Processing Systems, 2014, Montreal, QC, USA, 8–13 December 2014; pp. 2366–2374. [Google Scholar]
- Laina, I.; Rupprecht, C.; Belagiannis, V.; Tombari, F.; Navab, N. Deeper depth prediction with fully convolutional residual networks. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; IEEE: New York, NY, USA, 2016; pp. 239–248. [Google Scholar]
- Xu, D.; Ricci, E.; Ouyang, W.; Wang, X.; Sebe, N. Multi-scale continuous CRFs as sequential deep networks for monocular depth estimation. Pattern Recognit. 2017, 80, 152–162. [Google Scholar]
- Fu, H.; Gong, M.; Wang, C.; Batmanghelich, K.; Tao, D. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2002–2011. [Google Scholar]
- Chen, Q.; Cao, Y.; Wu, Q.; Shi, Q.; Zeng, B. Learning monocular depth estimation infusing traditional stereo knowledge. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 6406–6415. [Google Scholar]
- Eigen, D.; Fergus, R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 2650–2658. [Google Scholar]
- Liu, F.; Shen, C.; Lin, G. Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 2024–2039. [Google Scholar] [CrossRef]
- Godard, C.; Mac Aodha, O.; Brostow, G.J. Unsupervised monocular depth estimation with left-right consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6602–6611. [Google Scholar]
- Kuznietsov, Y.; Stückler, J.; Leibe, B. Semi-supervised deep learning for monocular depth map prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6647–6655. [Google Scholar]
- Zhang, Z.; Galvez-Lopez, D.; Garg, R.; Scaramuzza, D. DeepV2D: Video to Depth with Differentiable Structure from Motion. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
- Li, B.; Shen, C.; Dai, Y.; van den Hengel, A.; He, M. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1119–1127. [Google Scholar]
- Engel, J.; Stückler, J.; Cremers, D. Large-scale direct SLAM with stereo cameras. In Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, USA, 14–18 September 2014; IEEE: New York, NY, USA, 2014; pp. 1935–1942. [Google Scholar]
- Melekhov, I.; Ylimäki, M.; Kannala, J. RAFT-3D: Scene Flow estimation from RGB-D images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 7482–7491. [Google Scholar]
- Yang, R.; Dai, Y.; Li, H. Deep virtual stereo odometry: Leveraging deep depth prediction for monocular direct sparse odometry. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 842–857. [Google Scholar]
- Visentini-Scarzanella, M.; Sugiura, T.; Kaneko, T.; Koto, S. Deep monocular 3D reconstruction for assisted navigation in bronchoscopy. Int. J. Comput. Assist. Radiol. Surg. 2017, 12, 1089–1099. [Google Scholar] [CrossRef]
- Tateno, K.; Tombari, F.; Laina, I.; Navab, N. Cnn-slam: Real-time dense monocular slam with learned depth prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6243–6252. [Google Scholar]
- Ma, R.; Wang, R.; Pizer, S.; Rosenman, J.; McGill, S.K.; Frahm, J.M. Real-time 3D reconstruction of colonoscopic surfaces for determining missing regions. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Shenzhen, China, 13–17 October 2019; Springer: Cham, Switzerland, 2019; pp. 573–582. [Google Scholar]
- Lurie, K.L.; Angst, R.; Zlatev, D.V.; Liao, J.C.; Bowden, A.K.E. 3D reconstruction of cystoscopy videos for comprehensive bladder records. Biomed. Opt. Express 2017, 8, 2106–2123. [Google Scholar] [CrossRef]
- Yang, Z.; Simon, R.; Li, Y.; Linte, C.A. Dense Depth Estimation from Stereo Endoscopy Videos Using Unsupervised Optical Flow Methods. In Proceedings of the Annual Conference on Medical Image Understanding and Analysis, Oxford, UK, 12–14 July 2021; Springer: Cham, Switzerland, 2021; pp. 337–349. [Google Scholar]
- Wimbauer, F.; Yang, N.; von Stumberg, L.; Zeller, N.; Cremers, D. MonoRec: Semi-supervised dense reconstruction in dynamic environments from a single moving camera. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6112–6122. [Google Scholar]
- Fehr, M.; Furrer, F.; Dryanovski, I.; Sturm, J.; Gilitschenski, I.; Siegwart, R.; Cadena, C. TSDF-based change detection for consistent long-term dense reconstruction and dynamic object discovery. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; IEEE: New York, NY, USA, 2017; pp. 5237–5244. [Google Scholar]
- Bârsan, I.A.; Liu, P.; Pollefeys, M.; Geiger, A. Robust dense mapping for large-scale dynamic environments. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; IEEE: New York, NY, USA, 2018; pp. 7510–7517. [Google Scholar]
- Saputra, M.R.U.; Markham, A.; Trigoni, N. Visual SLAM and structure from motion in dynamic environments: A survey. ACM Comput. Surveys (CSUR) 2018, 51, 1–36. [Google Scholar] [CrossRef]
- Li, Y.; Li, S.; Ge, Y. A biologically inspired solution to simultaneous localization and consistent mapping in dynamic environments. Neurocomputing 2013, 104, 170–179. [Google Scholar] [CrossRef]
- Seok Lee, H.; Mu Lee, K. Dense 3d reconstruction from severely blurred images using a single moving camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 273–280. [Google Scholar]
- Saxena, R.C.; Friedman, S.; Bly, R.A.; Otjen, J.; Alessio, A.M.; Li, Y.; Hannaford, B.; Whipple, M.; Moe, K.S. Comparison of Micro–Computed Tomography and Clinical Computed Tomography Protocols for Visualization of Nasal Cartilage Before Surgical Planning for Rhinoplasty. JAMA Facial Plast. Surg. 2019, 21, 3. [Google Scholar] [CrossRef] [PubMed]
- Chen, R.J.; Bobrow, T.L.; Athey, T.; Mahmood, F.; Durr, N.J. Slam endoscopy enhanced by adversarial depth prediction. arXiv 2019, arXiv:1907.00283. [Google Scholar]
- Scaramuzza, D.; Fraundorfer, F. Visual Odometry [Tutorial]. Robot. Autom. Mag. IEEE 2011, 18, 80–92. [Google Scholar] [CrossRef]
- Adidharma, L.; Yang, Z.; Young, C.; Li, Y.; Hannaford, B.; Humphreys, I.; Abuzeid, W.M.; Ferreira, M.; Moe, K.S.; Bly, R.A. Semiautomated Method for Editing Surgical Videos. J. Neurol. Surg. Part B Skull Base 2021, 82, P057. [Google Scholar]
- Lamarca, J.; Parashar, S.; Bartoli, A.; Montiel, J. Defslam: Tracking and mapping of deforming scenes from monocular sequences. IEEE Trans. Robot. 2020, 37, 291–303. [Google Scholar] [CrossRef]
- Turan, M.; Almalioglu, Y.; Araujo, H.; Konukoglu, E.; Sitti, M. A non-rigid map fusion-based direct SLAM method for endoscopic capsule robots. Int. J. Intell. Robot. Appl. 2017, 1, 399–409. [Google Scholar] [CrossRef]
- Li, Y.; Hannaford, B. Soft-obstacle Avoidance for Redundant Manipulators with Recurrent Neural Network. In Proceedings of the Intelligent Robots and Systems (IROS), 2018 IEEE/RSJ International Conference, IEEE, Madrid, Spain, 1–5 October 2018; pp. 1–6. [Google Scholar]
- Péntek, Q.; Hein, S.; Miernik, A.; Reiterer, A. Image-based 3D surface approximation of the bladder using structure-from-motion for enhanced cystoscopy based on phantom data. Biomed. Eng. Biomed. Tech. 2018, 63, 461–466. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Li, S.; Hannaford, B. A Novel Recurrent Neural Network Control Scheme for Improving Redundant Manipulator Motion Planning Completeness. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; IEEE: New York, NY, USA, 2018; pp. 1–6. [Google Scholar]
- Li, Y.; Hannaford, B.; Humphreys, I.; Moe, K.S.; Bly, R.A. Learning Surgical Motion Pattern from Small Data in Endoscopic Sinus and Skull Base Surgeries. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: New York, NY, USA, 2021; pp. 1–6. [Google Scholar]
- Li, Y.; Bly, R.; Whipple, M.; Humphreys, I.; Hannaford, B.; Moe, K. Use Endoscope and Instrument and Pathway Relative Motion as Metric for Automated Objective Surgical Skill Assessment in Skull base and Sinus Surgery. J. Neurol. Surg. Part B Skull Base 2018, 79, A194. [Google Scholar] [CrossRef]
- Li, Y.; Bly, R.; Humphreys, I.; Whipple, M.; Hannaford, B.; Moe, K. Surgical Motion based Automatic Objective Surgical Completeness Assessment in Endoscopic Skull Base and Sinus Surgery. J. Neurol. Surg. Part Skull Base 2018, 79, P193. [Google Scholar] [CrossRef]
- Mahmoud, N.; Cirauqui, I.; Hostettler, A.; Doignon, C.; Soler, L.; Marescaux, J.; Montiel, J. ORBSLAM-based endoscope tracking and 3D reconstruction. In Proceedings of the International Workshop on Computer-Assisted and Robotic Endoscopy, Athens, Greece, 17 October 2016; Springer: Berlin, Germany, 2016; pp. 72–83. [Google Scholar]
- Soper, T.D.; Porter, M.P.; Seibel, E.J. Surface mosaics of the bladder reconstructed from endoscopic video for automated surveillance. IEEE Trans. Biomed. Eng. 2012, 59, 1670–1680. [Google Scholar] [CrossRef]
- Okatani, T.; Deguchi, K. Shape reconstruction from an endoscope image by shape from shading technique for a point light source at the projection center. Comput. Vis. Image Underst. 1997, 66, 119–131. [Google Scholar] [CrossRef]
- Davison, A.J.; Reid, I.D.; Molton, N.D.; Stasse, O. MonoSLAM: Real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 1052–1067. [Google Scholar] [CrossRef]
- Yu, C.; Liu, Z.; Liu, X.J.; Xie, F.; Yang, Y.; Wei, Q.; Fei, Q. DS-SLAM: A semantic visual SLAM towards dynamic environments. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; IEEE: New York, NY, USA, 2018; pp. 1168–1174. [Google Scholar]
- Milford, M.J.; Wyeth, G.F. SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation, Saint Paul, MN, USA, 14–18 May 2012; IEEE: New York, NY, USA, 2012; pp. 1643–1649. [Google Scholar]
- Pepperell, E.; Corke, P.; Milford, M. Routed roads: Probabilistic vision-based place recognition for changing conditions, split streets and varied viewpoints. Int. J. Robot. Res. 2016, 35, 1057–1179. [Google Scholar] [CrossRef]
- Yang, S.; Song, Y.; Kaess, M.; Scherer, S. Pop-up SLAM: Semantic monocular plane SLAM for low-texture environments. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; IEEE: New York, NY, USA, 2016; pp. 1222–1229. [Google Scholar]
- Gomez-Ojeda, R. Robust Visual SLAM in Challenging Environments with Low-Texture and Dynamic Illumination; UMA Editorial: Málaga, Spain, 2020. [Google Scholar]
- Lee, H.S.; Kwon, J.; Lee, K.M. Simultaneous localization, mapping and deblurring. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; IEEE: New York, NY, USA, 2011; pp. 1203–1210. [Google Scholar]
- Williams, B.; Klein, G.; Reid, I. Real-time SLAM relocalisation. In Proceedings of the 2007 IEEE 11th International Conference on Computer Vision, Rio De Janeiro, Brazil, 14–21 October 2007; IEEE: New York, NY, USA, 2007; pp. 1–8. [Google Scholar]
- Hsiao, M.; Kaess, M. Mh-isam2: Multi-hypothesis isam using bayes tree and hypo-tree. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; IEEE: New York, NY, USA, 2019; pp. 1274–1280. [Google Scholar]
- Vasconcelos, F.; Mazomenos, E.; Kelly, J.; Stoyanov, D. RCM-SLAM: Visual localisation and mapping under remote centre of motion constraints. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; IEEE: New York, NY, USA, 2019; pp. 9278–9284. [Google Scholar]
- Mur-Artal, R.; Tardós, J.D. Probabilistic Semi-Dense Mapping from Highly Accurate Feature-Based Monocular SLAM. In Proceedings of the Robotics: Science and Systems, Rome, Italy, 13–17 July 2015; Volume 2015. [Google Scholar]
- Wu, Y.; Zhang, Y.; Zhu, D.; Feng, Y.; Coleman, S.; Kerr, D. EAO-SLAM: Monocular semi-dense object SLAM based on ensemble data association. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; IEEE: New York, NY, USA, 2020; pp. 4966–4973. [Google Scholar]
- Wen, S.; Zhao, Y.; Liu, X.; Sun, F.; Lu, H.; Wang, Z. Hybrid Semi-Dense 3D Semantic-Topological Mapping from Stereo Visual-Inertial Odometry SLAM with Loop Closure Detection. IEEE Trans. Veh. Technol. 2020, 69, 16057–16066. [Google Scholar] [CrossRef]
- Mahmoud, N.; Hostettler, A.; Collins, T.; Soler, L.; Doignon, C.; Montiel, J. SLAM based quasi dense reconstruction for minimally invasive surgery scenes. arXiv 2017, arXiv:1705.09107. [Google Scholar]
- Newcombe, R. Dense Visual SLAM. Ph.D. Thesis, Imperial College London, London, UK, 2012. [Google Scholar]
- Li, Y. Deep Causal Learning for Robotic Intelligence. Front. Neurorobot. 2023, 1–27. [Google Scholar] [CrossRef]
- Pearl, J. Causality; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
- Yao, L.; Chu, Z.; Li, S.; Li, Y.; Gao, J.; Zhang, A. A survey on causal inference. ACM Trans. Knowl. Discov. Data (TKDD) 2021, 15, 1–46. [Google Scholar] [CrossRef]
Dataset | Year of Creation | Size and Type of Scenes | Size | Source of Depth | Camera Pose |
---|---|---|---|---|---|
ShapeNet | 2015 | 31,350, In & Out | 300 M | Synthetic | No |
Middlebury Stereo | 2001 | 47, In | 47 pairs | Structured light, Stereo | Yes |
KITTI Vision | 2012 | ∼28, Out | 42,382 | Velodyne LiDAR | Yes |
ETH3D | 2017 | 27, In & Out | 27 sets | Laser scanner | Yes |
NYU Depth V2 | 2012 | 1449, In | 144,959 | Kinect | Yes |
SUN3D | 2013 | 415, In & Out | N/A | Kinect, Xtion | Yes |
TUM RGB-D | 2012 | 39, In | N/A | Kinect | Yes |
ICL-NUIM | 2014 | 8, In | N/A | Synthetic | Yes |
EuRoC MAV | 2016 | 11, In | N/A | Laser scanner | Yes |
ApolloScape | 2018 | N/A, Out | >140,000 | LiDAR | Yes |
ScanNet | 2017 | 2513, In | N/A | Kinect v2, RealSense | Yes |
Matterport3D | 2017 | 90, In & Out | N/A | Matterport camera | Yes |
Stanford 2D-3D-S | 2017 | 6 areas, In | 70,496 | Matterport camera | Yes |
SceneNet RGB-D | 2016 | 5 million, In | 5 million | Synthetic | Yes |
Sintel | 2010 | N/A, In & Out | 1064 | Synthetic | No |
Redwood | 2016 | 100, In | N/A | Structure sensor | Yes |
FlyingThings3D | 2016 | N/A, In & Out | 3720 | Synthetic | Yes |
7-Scenes | 2014 | 7, In | N/A | Kinect | Yes |
Washington RGB-D | 2011 | 300, In | N/A | Kinect | Yes |
Blensor | 2013 | N/A, In & Out | N/A | Synthetic | Yes |
DTU Robot | 2014 | 124, In | 5000+ | Structured light | Yes |
Stanford 3D | 2006 | N/A, In & Out | N/A | Range scans | Yes |
Freiburg Forest | 2016 | 1, Out | N/A | Stereo | Yes |
SCARED | 2017 | 7, Med | 15,000 | Kinect/Synthetic | Yes |
EndoSLAM | 2016 | 35, Med | 60,000 | CT | Yes |
Algorithm | RMSE (m) | Rel Error | |||
---|---|---|---|---|---|
[94] | 0.641 | 0.214 | 0.611 | 0.887 | 0.971 |
[95] | 0.573 | 0.127 | 0.811 | 0.953 | 0.988 |
[96] | 0.523 | 0.120 | 0.838 | 0.976 | 0.997 |
[84] | - | - | 0.821 | 0.965 | 0.995 |
[97] | 0.471 | 0.187 | 0.815 | 0.955 | 0.988 |
[98] | - | - | 0.852 | 0.970 | 0.994 |
Algorithm | RMSE (m) | Rel Error | |||
---|---|---|---|---|---|
[99] | 6.266 | 0.203 | 0.696 | 0.900 | 0.967 |
[100] | 4.627 | 0.117 | 0.845 | 0.951 | 0.984 |
[101] | 4.863 | 0.187 | 0.809 | 0.953 | 0.986 |
[102] | 4.459 | 0.115 | 0.861 | 0.961 | 0.986 |
[103] | 4.401 | 0.112 | 0.868 | 0.967 | 0.991 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, Y. Three-Dimensional Dense Reconstruction: A Review of Algorithms and Datasets. Sensors 2024, 24, 5861. https://doi.org/10.3390/s24185861
Lee Y. Three-Dimensional Dense Reconstruction: A Review of Algorithms and Datasets. Sensors. 2024; 24(18):5861. https://doi.org/10.3390/s24185861
Chicago/Turabian StyleLee, Yangming. 2024. "Three-Dimensional Dense Reconstruction: A Review of Algorithms and Datasets" Sensors 24, no. 18: 5861. https://doi.org/10.3390/s24185861
APA StyleLee, Y. (2024). Three-Dimensional Dense Reconstruction: A Review of Algorithms and Datasets. Sensors, 24(18), 5861. https://doi.org/10.3390/s24185861