More Web Proxy on the site http://driver.im/

research-article

Open access

Constant Velocity Constraints for Self-Supervised Monocular Depth Estimation

Authors:

David Greenwood,

Han GongAuthors Info & Claims

CVMP '20: Proceedings of the 17th ACM SIGGRAPH European Conference on Visual Media Production

Article No.: 5, Pages 1 - 8

https://doi.org/10.1145/3429341.3429355

Published: 08 December 2020 Publication History

All formats PDF

Abstract

We present a new method for self-supervised monocular depth estimation. Contemporary monocular depth estimation methods use a triplet of consecutive video frames to estimate the central depth image. We make the assumption that the ego-centric view progresses linearly in the scene, based on the kinematic and physical properties of the camera. During the training phase, we can exploit this assumption to create a depth estimation for each image in the triplet. We then apply a new geometry constraint that supports novel synthetic views, thus providing a strong supervisory signal. Our contribution is simple to implement, requires no additional trainable parameter, and produces competitive results when compared with other state-of-the-art methods on the popular KITTI corpus.

References

[1]

Filippo Aleotti, Fabio Tosi, Matteo Poggi, and Stefano Mattoccia. 2018. Generative adversarial networks for unsupervised monocular depth prediction. In European Conference on Computer Vision Workshops. Springer, Munich, Germany, 337–354.

[2]

Ibraheem Alhashim and Peter Wonka. 2018. High Quality Monocular Depth Estimation via Transfer Learning. arxiv:1812.11941 [cs.CV]

[3]

V Madhu Babu, Kaushik Das, Anima Majumdar, and Swagat Kuma. 2018. Undemon: Unsupervised deep network for depth and ego-motion estimation. In International Conference on Intelligent Robots and Systems. IEEE, Madrid, Spain, 1082–1088.

[4]

Arunkumar Byravan and Dieter Fox. 2017. Se3-nets: Learning rigid body motion using deep neural networks. In International Conference on Robotics and Automation. IEEE, Marina Bay Sands, Singapore, 173–180.

Digital Library

[5]

Vincent Casser, Soeren Pirk, Reza Mahjourian, and Anelia Angelova. 2019. Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos. In AAAI Conference on Artificial Intelligence, Vol. 33. 8001–8008.

Digital Library

[6]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Conference on Computer Vision and Pattern Recognition. IEEE, Miami, FL, 248–255.

[7]

David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. In Conference on Neural Information Processing Systems. NeurIPS Foundation, Montréal, Canada, 1–9.

[8]

Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, and Dacheng Tao. 2018. Deep Ordinal Regression Network for Monocular Depth Estimation. In Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, Utah, 2002–2011.

[9]

Ravi Garg, Vijay Kumar BG, and Ian Reid. 2016. Unsupervised CNN for single view depth estimation: Geometry to the rescue. In European Conference on Computer Vision. Springer, Amsterdam, The Netherlands, 740–756.

[10]

Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The kitti dataset. International Journal of Robotics Research 2, 11 (2013), 1231–1237.

Digital Library

[11]

Clément Godard, Oisin Mac Aodha, and Gabriel J. Brostow. 2017. Unsupervised Monocular Depth Estimation with Left-Right Consistency. In Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, Hawaii, 6602–6611.

[12]

Clément Godard, Oisin Mac Aodha, Michael Firman, and Gabriel Brostow. 2019. Digging Into Self-Supervised Monocular Depth Estimation. In International Conference on Computer Vision. IEEE, Seoul, Korea, 3827–3837.

[13]

Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raventos, and Adrien Gaidon. 2020. 3D Packing for Self-Supervised Monocular Depth Estimation. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Virtual, 2485–2494.

[14]

Richard Hartley and Andrew Zisserman. 2003. Multiple view geometry in computer vision. Cambridge university press.

[15]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, Nevada, 770–778.

[16]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations. IEEE, San Diego, California, 1–15.

[17]

Maria Klodt and Andrea Vedaldi.2018. Supervising the new with the old: learning SFM from SFM. In European Conference on Computer Vision. Springer, Munich, Germany, 698–713.

[18]

Yevhen Kuznietsov, Jorg Stuckler, and Bastian Leibe. 2018. Semi-supervised deep learning for monocular depth map prediction. In European Conference on Computer Vision. Springer, Munich, Germany, 2215–2223.

[19]

Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. 2016. Deeper Depth Prediction with Fully Convolutional Residual Networks. In Fourth International Conference on 3D Vision. IEEE, Stanford University, California, 239–248.

[20]

Bo Li, Chunhua Shen, Yuchao Dai, Anton van den Hengel, and Mingyi He. 2015. Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In Conference on Computer Vision and Pattern Recognition. IEEE, Boston, Massachusetts, 1119–1127.

[21]

Qijie Li, Tianqing Chang, and Xuejun Jiao. 2012. A new targets matching method based on epipolar geometry. In International Conference on Virtual Environments Human-Computer Interfaces and Measurement Systems. IEEE, Tianjin, China, 135–139.

[22]

Ruihao Li, Sen Wang, Zhiqiang Long, and Dongbing Gu. 2018. UnDeepVO: Monocular Visual Odometry Through Unsupervised Deep Learning. In International Conference on Robotics and Automation. IEEE, Brisbane, Australia, 7286–7291.

[23]

Jaderberg Max, Simonyan Karen, Zisserman Andrew, 2015. Spatial transformer networks. In Advances in neural information processing systems. NeurIPS Foundation, Montréal, Canada, 2017–2025.

[24]

Nikolaus Mayer, Eddy Ilg, Philip Häusser, Philipp Fischer, Daniel Cremers, Alexey Dosovitskiy, and Thomas Brox. 2016. A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. In Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, Nevada, 4040–4048.

[25]

Richard A. Newcombe, Steven J. Lovegrove, and Andrew J. Davison. 2011. DTAM: Dense tracking and mapping in real-time. In International Conference on Computer Vision. IEEE, Barcelona, Spain, 2320–2327.

Digital Library

[26]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

Digital Library

[27]

Vaishakh Patil, Wouter Van Gansbeke, Dengxin Dai, and Luc Van Gool. 2020. Don’t Forget The Past: Recurrent Depth Estimation from Monocular Video. arxiv:2001.02613 [cs.CV]

[28]

Andrea Pilzer, Dan Xu, Mihai Marian Puscas, Elisa Ricci, and Nicu Sebe. 2018. Unsupervised adversarial depth estimation using cycled generative networks. In International Conference on 3D Vision. IEEE, Verona, Italy, 587–595.

[29]

Anurag Ranjan, Varun Jampani, Lukas Balles, Kihwan Kim, Deqing Sun, Jonas Wulff, and Michael J. Black. 2019. Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation. In Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, California, 12232–12241.

[30]

Takanori Senoh, Koki Wakunami, Hisayuki Sasaki, Ryutaro Oi, and Kenji Yamamoto. 2015. Fast depth estimation using non-iterative local optimization for super multi-view images. In Global Conference on Signal and Information Processing. IEEE, Orlando, Florida, 1042–1046.

[31]

Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, and Katerina Fragkiadaki. 2017. SfM-Net: Learning of Structure and Motion from Video. arxiv:1704.07804 [cs.CV]

[32]

Chaoyang Wang, Jose Miguel Buenaposada, Rui Zhu, and Simon Lucey. 2018. Learning Depth from Monocular Videos Using Direct Methods. In Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, Utah, 2022–2030.

[33]

Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.

Digital Library

[34]

Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Deep3D: Fully automatic 2D-to-3D video conversion with deep convolutional neural networks. In European Conference on Computer Vision. Springer, Amsterdam, The Netherlands, 842–857.

[35]

Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, and Ram Nevatia. 2018. LEGO: Learning Edge with Geometry all at Once by Watching Videos. In Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, Utah, 225–234.

[36]

Zhenheng Yang, Peng Wang, Wei Xu, Liang Zhao, and Ramakant Nevatia. 2017. Unsupervised Learning of Geometry with Edge-aware Depth-Normal Consistency. arxiv:1711.03665 [cs.CV]

[37]

Zhichao Yin and Jianping Shi. 2018. GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose. In Computer Vision and Pattern Recognition. IEEE, Salt Lake City, Utah, 1983–1992.

[38]

Huangying Zhan, Ravi Garg, Chamara Saroj Weerasekera, Kejie Li, Harsh Agarwal, and Ian M. Reid. 2018. Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction. In Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, Utah, 340–349.

[39]

Tinghui Zhou, Matthew Brown, Noah Snavely, and David G. Lowe. 2017. Unsupervised Learning of Depth and Ego-Motion from Video. In Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, Hawaii, 6612–6619.

[40]

Yuliang Zou, Zelun Luo, and Jia-Bin Huang. 2018. DF-Net: Unsupervised joint learning of depth and flow using cross-task consistency. In European Conference on Computer Vision. Springer, Munich, Germany, 1–18.

Digital Library

Cited By

Hu HFeng YLi DZhang SZhao H(2024)Monocular Depth Estimation via Self-Supervised Self-DistillationSensors10.3390/s2413409024:13(4090)Online publication date: 24-Jun-2024
https://doi.org/10.3390/s24134090
郑千(2024)Hybrid CNN and ViT for Self-Supervised Knowledge Distillation Monocular Depth Estimation MethodModeling and Simulation10.12677/mos.2024.13326013:03(2868-2880)Online publication date: 2024
https://doi.org/10.12677/mos.2024.133260
Gao YWu XLi SCai XLi C(2024)Color and Geometric Contrastive Learning Based Intra-Frame Supervision for Self-Supervised Monocular Depth EstimationIEEE Signal Processing Letters10.1109/LSP.2024.348003231(2940-2944)Online publication date: 2024
https://doi.org/10.1109/LSP.2024.3480032
Show More Cited By

Recommendations

Transferring knowledge from monocular completion for self-supervised monocular depth estimation
Abstract
Monocular depth estimation is a very challenging task in computer vision, with the goal to predict per-pixel depth from a single RGB image. Supervised learning methods require large amounts of depth measurement data, which are time-consuming and ...
Semantic and Optical Flow Guided Self-supervised Monocular Depth and Ego-Motion Estimation
Image and Graphics
Abstract
The self-supervised depth and camera pose estimation methods are proposed to address the difficulty of acquiring the densely labeled ground-truth data and have achieved a great advance. As the stereo vision could constrain the predicted depth to a ...
Monocular depth estimation using self-supervised learning with more effective geometric constraints
Abstract
Self-supervised learning-based depth estimation from monocular videos is a challenging yet promising way for 3D environment perception. Existing methods that use photometric consistency as supervision are often fragile in the case of textureless ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CVMP '20: Proceedings of the 17th ACM SIGGRAPH European Conference on Visual Media Production

December 2020

46 pages

ISBN:9781450381987

DOI:10.1145/3429341

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 December 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CVMP '20

Sponsor:

SIGGRAPH

CVMP '20: European Conference on Visual Media Production

December 7 - 8, 2020

Virtual Event, United Kingdom

Acceptance Rates

Overall Acceptance Rate 40 of 67 submissions, 60%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
871
Total Downloads

Downloads (Last 12 months)199
Downloads (Last 6 weeks)30

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hu HFeng YLi DZhang SZhao H(2024)Monocular Depth Estimation via Self-Supervised Self-DistillationSensors10.3390/s2413409024:13(4090)Online publication date: 24-Jun-2024
https://doi.org/10.3390/s24134090
郑千(2024)Hybrid CNN and ViT for Self-Supervised Knowledge Distillation Monocular Depth Estimation MethodModeling and Simulation10.12677/mos.2024.13326013:03(2868-2880)Online publication date: 2024
https://doi.org/10.12677/mos.2024.133260
Gao YWu XLi SCai XLi C(2024)Color and Geometric Contrastive Learning Based Intra-Frame Supervision for Self-Supervised Monocular Depth EstimationIEEE Signal Processing Letters10.1109/LSP.2024.348003231(2940-2944)Online publication date: 2024
https://doi.org/10.1109/LSP.2024.3480032
Xia ZWu TWang ZZhou MWu BChan CKong L(2024)Dense monocular depth estimation for stereoscopic vision based on pyramid transformer and multi-scale feature fusionScientific Reports10.1038/s41598-024-57908-z14:1Online publication date: 25-Mar-2024
https://doi.org/10.1038/s41598-024-57908-z
Sun GLiu JLiu MLiu MZhang Y(2024)Multiple prior representation learning for self-supervised monocular depth estimation via hybrid transformerEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108790135(108790)Online publication date: Sep-2024
https://doi.org/10.1016/j.engappai.2024.108790
Adachi MHonda KXue JSudo HUeda YYuda YWada MMiyamoto R(2023)Practical Implementation of Visual Navigation Based on Semantic Segmentation for Human-Centric EnvironmentsJournal of Robotics and Mechatronics10.20965/jrm.2023.p141935:6(1419-1434)Online publication date: 20-Dec-2023
https://doi.org/10.20965/jrm.2023.p1419
Liu ZLi RShao SWu XChen W(2023)Self-Supervised Monocular Depth Estimation With Self-Reference Distillation and Disparity Offset RefinementIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.327558433:12(7565-7577)Online publication date: Dec-2023
https://doi.org/10.1109/TCSVT.2023.3275584
Zheng QYu TWang F(2023)Self-supervised monocular depth estimation based on combining convolution and multilayer perceptronEngineering Applications of Artificial Intelligence10.1016/j.engappai.2022.105587117(105587)Online publication date: Jan-2023
https://doi.org/10.1016/j.engappai.2022.105587
Zheng QYu TWang F(2023)DCU-NET: Self-supervised monocular depth estimation based on densely connected U-shaped convolutional neural networksComputers & Graphics10.1016/j.cag.2023.01.016111(145-154)Online publication date: Apr-2023
https://doi.org/10.1016/j.cag.2023.01.016
Zhao CZhang YPoggi MTosi FGuo XZhu ZHuang GTang YMattoccia S(2022)MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer2022 International Conference on 3D Vision (3DV)10.1109/3DV57658.2022.00077(668-678)Online publication date: Sep-2022
https://doi.org/10.1109/3DV57658.2022.00077
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten