GigaDepth: Learning Depth from Structured Light with Branching Neural Networks

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13693))

Included in the following conference series:

European Conference on Computer Vision

4058 Accesses

Abstract

Structured light-based depth sensors provide accurate depth information independently of the scene appearance by extracting pattern positions from the captured pixel intensities. Spatial neighborhood encoding, in particular, is a popular structured light approach for off-the-shelf hardware. However, it suffers from the distortion and fragmentation of the projected pattern by the scene’s geometry in the vicinity of a pixel. This forces algorithms to find a delicate balance between depth prediction accuracy and robustness to pattern fragmentation or appearance change. While stereo matching provides more robustness at the expense of accuracy, we show that learning to regress a pixel’s position within the projected pattern is not only more accurate when combined with classification but can be made equally robust. We propose to split the regression problem into smaller classification sub-problems in a coarse-to-fine manner with the use of a weight-adaptive layer that efficiently implements branching per-pixel Multilayer Perceptrons applied to features extracted by a Convolutional Neural Network. As our approach requires full supervision, we train our algorithm on a rendered dataset sufficiently close to the real-world domain. On a separately captured real-world dataset, we show that our network outperforms state-of-the-art and is significantly more robust than other regression-based approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 79.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 99.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MDEConvFormer: estimating monocular depth as soft regression based on convolutional transformer

Article 27 January 2024

Du2Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels

Inpainting of Depth Images Using Deep Neural Networks for Real-Time Applications

Notes

1.
Randomly selected textures on planes used in walls as well as cube, sphere, cylinder and pill shapes.

References

Altschuler, M.D., Posdamer, J.L., Frieder, G., Altschuler, B.R., Taboada, J.: The numerical stereo camera. In: Three-Dimensional Machine Perception, vol. 0283, pp. 15–24. International Society for Optics and Photonics (1981)
Google Scholar
Bian, J.W., et al.: Unsupervised scale-consistent depth learning from video. Int. J. Comput. Vision 129, 2548–2564 (2021)
Article Google Scholar
Carrihill, B., Hummel, R.: Experiments with the intensity ratio depth sensor. Computer Vision, Graphics, and Image Processing 32(3), 337–358 (1985)
Article Google Scholar
Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. Tech. Rep. arXiv:1512.03012 [cs.GR], Stanford University – Princeton University – Toyota Technological Institute at Chicago (2015)
Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018)
Google Scholar
Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018)
Google Scholar
Fanello, S.R., et al.: HyperDepth: learning depth from structured light without matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5441–5450 (2016)
Google Scholar
Godard, C., Aodha, O.M., Firman, M., Brostow, G.: Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3827–3837 (2019)
Google Scholar
Gupta, M., Nakhate, N.: A geometric perspective on structured light coding. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 90–107. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_6
Chapter Google Scholar
Hirschmuller, H.: Accurate and efficient stereo processing by semi-global matching and mutual information. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 807–814 (2005)
Google Scholar
Hu, M., Wang, S., Li, B., Ning, S., Fan, L., Gong, X.: PENet: towards precise and efficient image guided depth completion. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 13656–13662 (2021)
Google Scholar
Hui, T.W., Tang, X., Loy, C.C.: LiteFlowNet: a lightweight convolutional neural network for optical flow estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 8981–8989 (2018)
Google Scholar
Van der Jeught, S., Dirckx, J.J.J.: Deep neural networks for single shot structured light profilometry. Opt. Express 27(12), 17091–17101 (2019)
Article Google Scholar
Johari, M., Carta, C., Fleuret, F.: DepthInSpace: exploitation and fusion of multiple video frames for structured-light depth estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6019–6028 (2021)
Google Scholar
Khamis, S., Fanello, S., Rhemann, C., Kowdle, A., Valentin, J., Izadi, S.: StereoNet: guided hierarchical refinement for real-time edge-aware depth prediction. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 596–613. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_35
Chapter Google Scholar
Martinez, M., Stiefelhagen, R.: Kinect unleashed: getting control over high resolution depth maps. In: Proceedings of the International Conference on Machine Vision Applications, pp. 247–240 (2013)
Google Scholar
Miangoleh, S.M.H., Dille, S., Mai, L., Paris, S., Aksoy, Y.: Boosting monocular depth estimation models to high-resolution via content-adaptive multi-resolution merging. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9680–9689 (2021)
Google Scholar
Mirdehghan, P., Chen, W., Kutulakos, K.N.: Optimal structured light à la carte. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6248–6257 (2018)
Google Scholar
Nguyen, H., Wang, Y., Wang, Z.: Single-shot 3D shape reconstruction using structured light and deep convolutional neural networks. Sensors 20(13), 3718 (2020)
Google Scholar
Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12179–12188 (2021)
Google Scholar
Riegler, G., Liao, Y., Donne, S., Koltun, V., Geiger, A.: Connecting the dots: learning representations for active monocular depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7616–7625 (2019)
Google Scholar
Tankovich, V., Hane, C., Zhang, Y., Kowdle, A., Fanello, S., Bouaziz, S.: HITNet: hierarchical iterative tile refinement network for real-time stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14362–14372 (2021)
Google Scholar
Van Gansbeke, W., Neven, D., De Brabandere, B., Van Gool, L.: Sparse and noisy LiDAR completion with RGB guidance and uncertainty. In: Proceedings of the International Conference on Machine Vision Applications, pp. 1–6 (2019)
Google Scholar
Watson, J., Aodha, O.M., Prisacariu, V., Brostow, G., Firman, M.: The temporal opportunist: Self-supervised multi-frame monocular depth. In: Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition, pp. 1164–1174 (2021)
Google Scholar
Zhang, Y., Funkhouser, T.: Deep depth completion of a single RGB-D image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 175–185 (2018)
Google Scholar
Zhang, Y., et al.: ActiveStereoNet: end-to-end self-supervised learning for active stereo systems. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 802–819. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_48
Chapter Google Scholar

Download references

Acknowledgements

The research leading to these results has received funding from EC Horizon 2020 for Research and Innovation under grant agreement No. 101017089, TraceBot and the Austrian Science Foundation (FWF) under grant agreement No. I3969-N30, InDex.

Author information

Authors and Affiliations

ACIN, TU Wien, Gusshausstrasse 27-29/E376, 1040, Vienna, Austria
Simon Schreiberhuber, Jean-Baptiste Weibel & Markus Vincze
UTS Robotics Institute, 81 Broadway, Building 11, Sydney, NSW, 2007, Australia
Timothy Patten

Authors

Simon Schreiberhuber
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Baptiste Weibel
View author publications
You can also search for this author in PubMed Google Scholar
Timothy Patten
View author publications
You can also search for this author in PubMed Google Scholar
Markus Vincze
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simon Schreiberhuber .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 15713 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schreiberhuber, S., Weibel, JB., Patten, T., Vincze, M. (2022). GigaDepth: Learning Depth from Structured Light with Branching Neural Networks. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13693. Springer, Cham. https://doi.org/10.1007/978-3-031-19827-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-19827-4_13
Published: 02 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19826-7
Online ISBN: 978-3-031-19827-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics