[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3429341.3429355acmconferencesArticle/Chapter ViewAbstractPublication PagescvmpConference Proceedingsconference-collections
research-article
Open access

Constant Velocity Constraints for Self-Supervised Monocular Depth Estimation

Published: 08 December 2020 Publication History

Abstract

We present a new method for self-supervised monocular depth estimation. Contemporary monocular depth estimation methods use a triplet of consecutive video frames to estimate the central depth image. We make the assumption that the ego-centric view progresses linearly in the scene, based on the kinematic and physical properties of the camera. During the training phase, we can exploit this assumption to create a depth estimation for each image in the triplet. We then apply a new geometry constraint that supports novel synthetic views, thus providing a strong supervisory signal. Our contribution is simple to implement, requires no additional trainable parameter, and produces competitive results when compared with other state-of-the-art methods on the popular KITTI corpus.

References

[1]
Filippo Aleotti, Fabio Tosi, Matteo Poggi, and Stefano Mattoccia. 2018. Generative adversarial networks for unsupervised monocular depth prediction. In European Conference on Computer Vision Workshops. Springer, Munich, Germany, 337–354.
[2]
Ibraheem Alhashim and Peter Wonka. 2018. High Quality Monocular Depth Estimation via Transfer Learning. arxiv:1812.11941 [cs.CV]
[3]
V Madhu Babu, Kaushik Das, Anima Majumdar, and Swagat Kuma. 2018. Undemon: Unsupervised deep network for depth and ego-motion estimation. In International Conference on Intelligent Robots and Systems. IEEE, Madrid, Spain, 1082–1088.
[4]
Arunkumar Byravan and Dieter Fox. 2017. Se3-nets: Learning rigid body motion using deep neural networks. In International Conference on Robotics and Automation. IEEE, Marina Bay Sands, Singapore, 173–180.
[5]
Vincent Casser, Soeren Pirk, Reza Mahjourian, and Anelia Angelova. 2019. Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos. In AAAI Conference on Artificial Intelligence, Vol. 33. 8001–8008.
[6]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Conference on Computer Vision and Pattern Recognition. IEEE, Miami, FL, 248–255.
[7]
David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. In Conference on Neural Information Processing Systems. NeurIPS Foundation, Montréal, Canada, 1–9.
[8]
Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, and Dacheng Tao. 2018. Deep Ordinal Regression Network for Monocular Depth Estimation. In Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, Utah, 2002–2011.
[9]
Ravi Garg, Vijay Kumar BG, and Ian Reid. 2016. Unsupervised CNN for single view depth estimation: Geometry to the rescue. In European Conference on Computer Vision. Springer, Amsterdam, The Netherlands, 740–756.
[10]
Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The kitti dataset. International Journal of Robotics Research 2, 11 (2013), 1231–1237.
[11]
Clément Godard, Oisin Mac Aodha, and Gabriel J. Brostow. 2017. Unsupervised Monocular Depth Estimation with Left-Right Consistency. In Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, Hawaii, 6602–6611.
[12]
Clément Godard, Oisin Mac Aodha, Michael Firman, and Gabriel Brostow. 2019. Digging Into Self-Supervised Monocular Depth Estimation. In International Conference on Computer Vision. IEEE, Seoul, Korea, 3827–3837.
[13]
Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raventos, and Adrien Gaidon. 2020. 3D Packing for Self-Supervised Monocular Depth Estimation. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Virtual, 2485–2494.
[14]
Richard Hartley and Andrew Zisserman. 2003. Multiple view geometry in computer vision. Cambridge university press.
[15]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, Nevada, 770–778.
[16]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations. IEEE, San Diego, California, 1–15.
[17]
Maria Klodt and Andrea Vedaldi.2018. Supervising the new with the old: learning SFM from SFM. In European Conference on Computer Vision. Springer, Munich, Germany, 698–713.
[18]
Yevhen Kuznietsov, Jorg Stuckler, and Bastian Leibe. 2018. Semi-supervised deep learning for monocular depth map prediction. In European Conference on Computer Vision. Springer, Munich, Germany, 2215–2223.
[19]
Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. 2016. Deeper Depth Prediction with Fully Convolutional Residual Networks. In Fourth International Conference on 3D Vision. IEEE, Stanford University, California, 239–248.
[20]
Bo Li, Chunhua Shen, Yuchao Dai, Anton van den Hengel, and Mingyi He. 2015. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In Conference on Computer Vision and Pattern Recognition. IEEE, Boston, Massachusetts, 1119–1127.
[21]
Qijie Li, Tianqing Chang, and Xuejun Jiao. 2012. A new targets matching method based on epipolar geometry. In International Conference on Virtual Environments Human-Computer Interfaces and Measurement Systems. IEEE, Tianjin, China, 135–139.
[22]
Ruihao Li, Sen Wang, Zhiqiang Long, and Dongbing Gu. 2018. UnDeepVO: Monocular Visual Odometry Through Unsupervised Deep Learning. In International Conference on Robotics and Automation. IEEE, Brisbane, Australia, 7286–7291.
[23]
Jaderberg Max, Simonyan Karen, Zisserman Andrew, 2015. Spatial transformer networks. In Advances in neural information processing systems. NeurIPS Foundation, Montréal, Canada, 2017–2025.
[24]
Nikolaus Mayer, Eddy Ilg, Philip Häusser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. 2016. A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. In Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, Nevada, 4040–4048.
[25]
Richard A. Newcombe, Steven J. Lovegrove, and Andrew J. Davison. 2011. DTAM: Dense tracking and mapping in real-time. In International Conference on Computer Vision. IEEE, Barcelona, Spain, 2320–2327.
[26]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
[27]
Vaishakh Patil, Wouter Van Gansbeke, Dengxin Dai, and Luc Van Gool. 2020. Don’t Forget The Past: Recurrent Depth Estimation from Monocular Video. arxiv:2001.02613 [cs.CV]
[28]
Andrea Pilzer, Dan Xu, Mihai Marian Puscas, Elisa Ricci, and Nicu Sebe. 2018. Unsupervised adversarial depth estimation using cycled generative networks. In International Conference on 3D Vision. IEEE, Verona, Italy, 587–595.
[29]
Anurag Ranjan, Varun Jampani, Lukas Balles, Kihwan Kim, Deqing Sun, Jonas Wulff, and Michael J. Black. 2019. Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation. In Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, California, 12232–12241.
[30]
Takanori Senoh, Koki Wakunami, Hisayuki Sasaki, Ryutaro Oi, and Kenji Yamamoto. 2015. Fast depth estimation using non-iterative local optimization for super multi-view images. In Global Conference on Signal and Information Processing. IEEE, Orlando, Florida, 1042–1046.
[31]
Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, and Katerina Fragkiadaki. 2017. SfM-Net: Learning of Structure and Motion from Video. arxiv:1704.07804 [cs.CV]
[32]
Chaoyang Wang, Jose Miguel Buenaposada, Rui Zhu, and Simon Lucey. 2018. Learning Depth from Monocular Videos Using Direct Methods. In Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, Utah, 2022–2030.
[33]
Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.
[34]
Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Deep3D: Fully automatic 2D-to-3D video conversion with deep convolutional neural networks. In European Conference on Computer Vision. Springer, Amsterdam, The Netherlands, 842–857.
[35]
Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, and Ram Nevatia. 2018. LEGO: Learning Edge with Geometry all at Once by Watching Videos. In Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, Utah, 225–234.
[36]
Zhenheng Yang, Peng Wang, Wei Xu, Liang Zhao, and Ramakant Nevatia. 2017. Unsupervised Learning of Geometry with Edge-aware Depth-Normal Consistency. arxiv:1711.03665 [cs.CV]
[37]
Zhichao Yin and Jianping Shi. 2018. GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose. In Computer Vision and Pattern Recognition. IEEE, Salt Lake City, Utah, 1983–1992.
[38]
Huangying Zhan, Ravi Garg, Chamara Saroj Weerasekera, Kejie Li, Harsh Agarwal, and Ian M. Reid. 2018. Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction. In Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, Utah, 340–349.
[39]
Tinghui Zhou, Matthew Brown, Noah Snavely, and David G. Lowe. 2017. Unsupervised Learning of Depth and Ego-Motion from Video. In Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, Hawaii, 6612–6619.
[40]
Yuliang Zou, Zelun Luo, and Jia-Bin Huang. 2018. DF-Net: Unsupervised joint learning of depth and flow using cross-task consistency. In European Conference on Computer Vision. Springer, Munich, Germany, 1–18.

Cited By

View all
  • (2024)Monocular Depth Estimation via Self-Supervised Self-DistillationSensors10.3390/s2413409024:13(4090)Online publication date: 24-Jun-2024
  • (2024)Hybrid CNN and ViT for Self-Supervised Knowledge Distillation Monocular Depth Estimation MethodModeling and Simulation10.12677/mos.2024.13326013:03(2868-2880)Online publication date: 2024
  • (2024)Color and Geometric Contrastive Learning Based Intra-Frame Supervision for Self-Supervised Monocular Depth EstimationIEEE Signal Processing Letters10.1109/LSP.2024.348003231(2940-2944)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CVMP '20: Proceedings of the 17th ACM SIGGRAPH European Conference on Visual Media Production
December 2020
46 pages
ISBN:9781450381987
DOI:10.1145/3429341
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deep Learning
  2. Monocular Depth Estimation
  3. Self-supervised Learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CVMP '20
Sponsor:
CVMP '20: European Conference on Visual Media Production
December 7 - 8, 2020
Virtual Event, United Kingdom

Acceptance Rates

Overall Acceptance Rate 40 of 67 submissions, 60%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)199
  • Downloads (Last 6 weeks)30
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Monocular Depth Estimation via Self-Supervised Self-DistillationSensors10.3390/s2413409024:13(4090)Online publication date: 24-Jun-2024
  • (2024)Hybrid CNN and ViT for Self-Supervised Knowledge Distillation Monocular Depth Estimation MethodModeling and Simulation10.12677/mos.2024.13326013:03(2868-2880)Online publication date: 2024
  • (2024)Color and Geometric Contrastive Learning Based Intra-Frame Supervision for Self-Supervised Monocular Depth EstimationIEEE Signal Processing Letters10.1109/LSP.2024.348003231(2940-2944)Online publication date: 2024
  • (2024)Dense monocular depth estimation for stereoscopic vision based on pyramid transformer and multi-scale feature fusionScientific Reports10.1038/s41598-024-57908-z14:1Online publication date: 25-Mar-2024
  • (2024)Multiple prior representation learning for self-supervised monocular depth estimation via hybrid transformerEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108790135(108790)Online publication date: Sep-2024
  • (2023)Practical Implementation of Visual Navigation Based on Semantic Segmentation for Human-Centric EnvironmentsJournal of Robotics and Mechatronics10.20965/jrm.2023.p141935:6(1419-1434)Online publication date: 20-Dec-2023
  • (2023)Self-Supervised Monocular Depth Estimation With Self-Reference Distillation and Disparity Offset RefinementIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.327558433:12(7565-7577)Online publication date: Dec-2023
  • (2023)Self-supervised monocular depth estimation based on combining convolution and multilayer perceptronEngineering Applications of Artificial Intelligence10.1016/j.engappai.2022.105587117(105587)Online publication date: Jan-2023
  • (2023)DCU-NET: Self-supervised monocular depth estimation based on densely connected U-shaped convolutional neural networksComputers & Graphics10.1016/j.cag.2023.01.016111(145-154)Online publication date: Apr-2023
  • (2022)MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer2022 International Conference on 3D Vision (3DV)10.1109/3DV57658.2022.00077(668-678)Online publication date: Sep-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media