Guided Feature Selection for Deep Visual Odometry

Fei Xue^18,21,
Qiuyuan Wang^18,21,
Xin Wang^18,21,
Wei Dong¹⁹,
Junqiu Wang²⁰ &
…
Hongbin Zha^18,21

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11366))

Included in the following conference series:

Asian Conference on Computer Vision

2367 Accesses
26 Citations

Abstract

We present a novel end-to-end visual odometry architecture with guided feature selection based on deep convolutional recurrent neural networks. Different from current monocular visual odometry methods, our approach is established on the intuition that features contribute discriminately to different motion patterns. Specifically, we propose a dual-branch recurrent network to learn the rotation and translation separately by leveraging current Convolutional Neural Network (CNN) for feature representation and Recurrent Neural Network (RNN) for image sequence reasoning. To enhance the ability of feature selection, we further introduce an effective context-aware guidance mechanism to force each branch to distill related information for specific motion pattern explicitly. Experiments demonstrate that on the prevalent KITTI and ICL_NUIM benchmarks, our method outperforms current state-of-the-art model- and learning-based methods for both decoupled and joint camera pose recovery.

Supported by the National Key Research and Development Program of China (2017YFB1002601) and National Natural Science Foundation of China (61632003, 61771026).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Efficient Deep Visual and Inertial Odometry with Adaptive Visual Modality Selection

VIOFormer: Advancing Monocular Visual-Inertial Odometry Through Transformer-Based Fusion

Transformer guided geometry model for flow-based unsupervised visual odometry

Article 02 January 2021

References

Bazin, J.C., Demonceaux, C., Vasseur, P., Kweon, I.: Motion estimation by decoupling rotation and translation in catadioptric vision. CVIU 114, 254–273 (2010)
Google Scholar
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI 40, 834–848 (2018)
Google Scholar
Choi, J., et al.: Context-aware deep feature compression for high-speed visual tracking. In: CVPR (2018)
Google Scholar
Clark, R., Wang, S., Wen, H., Markham, A., Trigoni, N.: VINet: visual-inertial odometry as a sequence-to-sequence learning problem. In: AAAI (2017)
Google Scholar
Dai, A., Nießner, M., Zollhöfer, M., Izadi, S., Theobalt, C.: Bundlefusion: real-time globally consistent 3D reconstruction using on-the-fly surface reintegration. TOG 36, 76a (2017)
Google Scholar
Dosovitskiy, A., et al.: Flownet: learning optical flow with convolutional networks. In: ICCV (2015)
Google Scholar
Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. TPAMI 1, 4 (2017)
Google Scholar
Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: large-scale direct monocular SLAM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 834–849. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_54
Chapter Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR (2012)
Google Scholar
Geiger, A., Ziegler, J., Stiller, C.: Stereoscan: dense 3D reconstruction in real-time. In: IV (2011)
Google Scholar
Handa, A., Whelan, T., McDonald, J., Davison, A.J.: A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In: ICRA (2014)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV (2015)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)
Google Scholar
Jo, Y., Jang, J., Paik, J.: Camera orientation estimation using motion based vanishing point detection for automatic driving assistance system. In: ICCE (2018)
Google Scholar
Kaess, M., Ni, K., Dellaert, F.: Flow separation for fast and robust stereo odometry. In: ICRA (2009)
Google Scholar
Kerl, C., Sturm, J., Cremers, D.: Robust odometry estimation for RGB-D cameras. In: ICRA (2013)
Google Scholar
Kim, P., Coltin, B., Kim, H.J.: Visual odometry with drift-free rotation estimation using indoor scene regularities. In: BMVC (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Lee, J.K., Yoon, K.J., et al.: Real-time joint estimation of camera orientation and vanishing points. In: CVPR (2015)
Google Scholar
Li, R., Wang, S., Long, Z., Gu, D.: UnDeepVO: monocular visual odometry through unsupervised deep learning. In: ICRA (2018)
Google Scholar
Liu, N., Han, J.: PiCANet: learning pixel-wise contextual attention in ConvNets and its application in saliency detection. In: CVPR (2018)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)
Google Scholar
Mac Aodha, O., Perona, P., et al.: Context embedding networks. In: CVPR (2018)
Google Scholar
Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. T-RO 33, 1255–1262 (2017)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML (2010)
Google Scholar
Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: dense tracking and mapping in real-time. In: ICCV (2011)
Google Scholar
Paszke, A., Gross, S., Chintala, S., Chanan, G.: Pytorch (2017). https://github.com/pytorch/pytorch
Paz, L.M., Piniés, P., Tardós, J.D., Neira, J.: Large-scale 6-DOF SLAM with stereo-in-hand. T-RO 24, 946–957 (2008)
Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: ICCV (2011)
Google Scholar
Straub, J., Bhandari, N., Leonard, J.J., Fisher, J.W.: Real-time Manhattan world rotation estimation in 3D. In: IROS (2015)
Google Scholar
Tardif, J.P., Pavlidis, Y., Daniilidis, K.: Monocular visual odometry in urban environments using an omnidirectional camera. In: IROS (2008)
Google Scholar
Ummenhofer, B., et al.: DeMoN: depth and motion network for learning monocular stereo. In: CVPR (2017)
Google Scholar
Wang, S., Clark, R., Wen, H., Trigoni, N.: DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: ICRA (2017)
Google Scholar
Wang, S., Clark, R., Wen, H., Trigoni, N.: End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. IJRR 37, 513–542 (2017)
Google Scholar
Xingjian, S., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: NIPS (2015)
Google Scholar
Yin, Z., Shi, J.: GeoNet: unsupervised learning of dense depth, optical flow and camera pose. In: CVPR (2018)
Google Scholar
Zamir, A.R., Sax, A., Shen, W., Guibas, L., Malik, J., Savarese, S.: Taskonomy: disentangling task transfer learning. In: CVPR (2018)
Google Scholar
Zhan, H., Garg, R., Saroj Weerasekera, C., Li, K., Agarwal, H., Reid, I.: Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: CVPR (2018)
Google Scholar
Zhang, H., et al.: Context encoding for semantic segmentation. In: CVPR (2018)
Google Scholar
Zhang, J., Kaess, M., Singh, S.: Real-time depth enhanced monocular odometry. In: IROS (2014)
Google Scholar
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: CVPR (2017)
Google Scholar
Zhou, Y., Kneip, L., Rodriguez, C., Li, H.: Divide and conquer: efficient density-based tracking of 3D sensors in Manhattan worlds. In: ACCV (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Machine Perception (MOE), School of EECS, Peking University, Beijing, China
Fei Xue, Qiuyuan Wang, Xin Wang & Hongbin Zha
Robotics Institute, Carnegie Mellon University, Pittsburgh, USA
Wei Dong
Beijing Changcheng Aviation Measurement and Control Institute, Beijing, China
Junqiu Wang
Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, Shanghai, China
Fei Xue, Qiuyuan Wang, Xin Wang & Hongbin Zha

Authors

Fei Xue
View author publications
You can also search for this author in PubMed Google Scholar
Qiuyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Dong
View author publications
You can also search for this author in PubMed Google Scholar
Junqiu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hongbin Zha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Fei Xue or Hongbin Zha .

Editor information

Editors and Affiliations

IIIT Hyderabad, Hyderabad, India
C.V. Jawahar
ANU, Canberra, ACT, Australia
Hongdong Li
Simon Fraser University, Burnaby, BC, Canada
Greg Mori
ETH Zurich, Zurich, Zürich, Switzerland
Konrad Schindler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xue, F., Wang, Q., Wang, X., Dong, W., Wang, J., Zha, H. (2019). Guided Feature Selection for Deep Visual Odometry. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11366. Springer, Cham. https://doi.org/10.1007/978-3-030-20876-9_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-20876-9_19
Published: 26 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20875-2
Online ISBN: 978-3-030-20876-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Guided Feature Selection for Deep Visual Odometry

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Deep Visual and Inertial Odometry with Adaptive Visual Modality Selection

VIOFormer: Advancing Monocular Visual-Inertial Odometry Through Transformer-Based Fusion

Transformer guided geometry model for flow-based unsupervised visual odometry

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Guided Feature Selection for Deep Visual Odometry

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Deep Visual and Inertial Odometry with Adaptive Visual Modality Selection

VIOFormer: Advancing Monocular Visual-Inertial Odometry Through Transformer-Based Fusion

Transformer guided geometry model for flow-based unsupervised visual odometry

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation