Abstract
Structured light-based depth sensors provide accurate depth information independently of the scene appearance by extracting pattern positions from the captured pixel intensities. Spatial neighborhood encoding, in particular, is a popular structured light approach for off-the-shelf hardware. However, it suffers from the distortion and fragmentation of the projected pattern by the scene’s geometry in the vicinity of a pixel. This forces algorithms to find a delicate balance between depth prediction accuracy and robustness to pattern fragmentation or appearance change. While stereo matching provides more robustness at the expense of accuracy, we show that learning to regress a pixel’s position within the projected pattern is not only more accurate when combined with classification but can be made equally robust. We propose to split the regression problem into smaller classification sub-problems in a coarse-to-fine manner with the use of a weight-adaptive layer that efficiently implements branching per-pixel Multilayer Perceptrons applied to features extracted by a Convolutional Neural Network. As our approach requires full supervision, we train our algorithm on a rendered dataset sufficiently close to the real-world domain. On a separately captured real-world dataset, we show that our network outperforms state-of-the-art and is significantly more robust than other regression-based approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Randomly selected textures on planes used in walls as well as cube, sphere, cylinder and pill shapes.
References
Altschuler, M.D., Posdamer, J.L., Frieder, G., Altschuler, B.R., Taboada, J.: The numerical stereo camera. In: Three-Dimensional Machine Perception, vol. 0283, pp. 15–24. International Society for Optics and Photonics (1981)
Bian, J.W., et al.: Unsupervised scale-consistent depth learning from video. Int. J. Comput. Vision 129, 2548–2564 (2021)
Carrihill, B., Hummel, R.: Experiments with the intensity ratio depth sensor. Computer Vision, Graphics, and Image Processing 32(3), 337–358 (1985)
Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. Tech. Rep. arXiv:1512.03012 [cs.GR], Stanford University – Princeton University – Toyota Technological Institute at Chicago (2015)
Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018)
Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410–5418 (2018)
Fanello, S.R., et al.: HyperDepth: learning depth from structured light without matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5441–5450 (2016)
Godard, C., Aodha, O.M., Firman, M., Brostow, G.: Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3827–3837 (2019)
Gupta, M., Nakhate, N.: A geometric perspective on structured light coding. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 90–107. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_6
Hirschmuller, H.: Accurate and efficient stereo processing by semi-global matching and mutual information. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 807–814 (2005)
Hu, M., Wang, S., Li, B., Ning, S., Fan, L., Gong, X.: PENet: towards precise and efficient image guided depth completion. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 13656–13662 (2021)
Hui, T.W., Tang, X., Loy, C.C.: LiteFlowNet: a lightweight convolutional neural network for optical flow estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 8981–8989 (2018)
Van der Jeught, S., Dirckx, J.J.J.: Deep neural networks for single shot structured light profilometry. Opt. Express 27(12), 17091–17101 (2019)
Johari, M., Carta, C., Fleuret, F.: DepthInSpace: exploitation and fusion of multiple video frames for structured-light depth estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6019–6028 (2021)
Khamis, S., Fanello, S., Rhemann, C., Kowdle, A., Valentin, J., Izadi, S.: StereoNet: guided hierarchical refinement for real-time edge-aware depth prediction. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 596–613. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_35
Martinez, M., Stiefelhagen, R.: Kinect unleashed: getting control over high resolution depth maps. In: Proceedings of the International Conference on Machine Vision Applications, pp. 247–240 (2013)
Miangoleh, S.M.H., Dille, S., Mai, L., Paris, S., Aksoy, Y.: Boosting monocular depth estimation models to high-resolution via content-adaptive multi-resolution merging. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9680–9689 (2021)
Mirdehghan, P., Chen, W., Kutulakos, K.N.: Optimal structured light à la carte. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6248–6257 (2018)
Nguyen, H., Wang, Y., Wang, Z.: Single-shot 3D shape reconstruction using structured light and deep convolutional neural networks. Sensors 20(13), 3718 (2020)
Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12179–12188 (2021)
Riegler, G., Liao, Y., Donne, S., Koltun, V., Geiger, A.: Connecting the dots: learning representations for active monocular depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7616–7625 (2019)
Tankovich, V., Hane, C., Zhang, Y., Kowdle, A., Fanello, S., Bouaziz, S.: HITNet: hierarchical iterative tile refinement network for real-time stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14362–14372 (2021)
Van Gansbeke, W., Neven, D., De Brabandere, B., Van Gool, L.: Sparse and noisy LiDAR completion with RGB guidance and uncertainty. In: Proceedings of the International Conference on Machine Vision Applications, pp. 1–6 (2019)
Watson, J., Aodha, O.M., Prisacariu, V., Brostow, G., Firman, M.: The temporal opportunist: Self-supervised multi-frame monocular depth. In: Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition, pp. 1164–1174 (2021)
Zhang, Y., Funkhouser, T.: Deep depth completion of a single RGB-D image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 175–185 (2018)
Zhang, Y., et al.: ActiveStereoNet: end-to-end self-supervised learning for active stereo systems. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 802–819. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_48
Acknowledgements
The research leading to these results has received funding from EC Horizon 2020 for Research and Innovation under grant agreement No. 101017089, TraceBot and the Austrian Science Foundation (FWF) under grant agreement No. I3969-N30, InDex.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Schreiberhuber, S., Weibel, JB., Patten, T., Vincze, M. (2022). GigaDepth: Learning Depth from Structured Light with Branching Neural Networks. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13693. Springer, Cham. https://doi.org/10.1007/978-3-031-19827-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-19827-4_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19826-7
Online ISBN: 978-3-031-19827-4
eBook Packages: Computer ScienceComputer Science (R0)