Abstract
We present a novel end-to-end visual odometry architecture with guided feature selection based on deep convolutional recurrent neural networks. Different from current monocular visual odometry methods, our approach is established on the intuition that features contribute discriminately to different motion patterns. Specifically, we propose a dual-branch recurrent network to learn the rotation and translation separately by leveraging current Convolutional Neural Network (CNN) for feature representation and Recurrent Neural Network (RNN) for image sequence reasoning. To enhance the ability of feature selection, we further introduce an effective context-aware guidance mechanism to force each branch to distill related information for specific motion pattern explicitly. Experiments demonstrate that on the prevalent KITTI and ICL_NUIM benchmarks, our method outperforms current state-of-the-art model- and learning-based methods for both decoupled and joint camera pose recovery.
Supported by the National Key Research and Development Program of China (2017YFB1002601) and National Natural Science Foundation of China (61632003, 61771026).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bazin, J.C., Demonceaux, C., Vasseur, P., Kweon, I.: Motion estimation by decoupling rotation and translation in catadioptric vision. CVIU 114, 254–273 (2010)
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI 40, 834–848 (2018)
Choi, J., et al.: Context-aware deep feature compression for high-speed visual tracking. In: CVPR (2018)
Clark, R., Wang, S., Wen, H., Markham, A., Trigoni, N.: VINet: visual-inertial odometry as a sequence-to-sequence learning problem. In: AAAI (2017)
Dai, A., Nießner, M., Zollhöfer, M., Izadi, S., Theobalt, C.: Bundlefusion: real-time globally consistent 3D reconstruction using on-the-fly surface reintegration. TOG 36, 76a (2017)
Dosovitskiy, A., et al.: Flownet: learning optical flow with convolutional networks. In: ICCV (2015)
Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. TPAMI 1, 4 (2017)
Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: large-scale direct monocular SLAM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 834–849. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_54
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR (2012)
Geiger, A., Ziegler, J., Stiller, C.: Stereoscan: dense 3D reconstruction in real-time. In: IV (2011)
Handa, A., Whelan, T., McDonald, J., Davison, A.J.: A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM. In: ICRA (2014)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV (2015)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)
Jo, Y., Jang, J., Paik, J.: Camera orientation estimation using motion based vanishing point detection for automatic driving assistance system. In: ICCE (2018)
Kaess, M., Ni, K., Dellaert, F.: Flow separation for fast and robust stereo odometry. In: ICRA (2009)
Kerl, C., Sturm, J., Cremers, D.: Robust odometry estimation for RGB-D cameras. In: ICRA (2013)
Kim, P., Coltin, B., Kim, H.J.: Visual odometry with drift-free rotation estimation using indoor scene regularities. In: BMVC (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Lee, J.K., Yoon, K.J., et al.: Real-time joint estimation of camera orientation and vanishing points. In: CVPR (2015)
Li, R., Wang, S., Long, Z., Gu, D.: UnDeepVO: monocular visual odometry through unsupervised deep learning. In: ICRA (2018)
Liu, N., Han, J.: PiCANet: learning pixel-wise contextual attention in ConvNets and its application in saliency detection. In: CVPR (2018)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)
Mac Aodha, O., Perona, P., et al.: Context embedding networks. In: CVPR (2018)
Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. T-RO 33, 1255–1262 (2017)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML (2010)
Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: dense tracking and mapping in real-time. In: ICCV (2011)
Paszke, A., Gross, S., Chintala, S., Chanan, G.: Pytorch (2017). https://github.com/pytorch/pytorch
Paz, L.M., Piniés, P., Tardós, J.D., Neira, J.: Large-scale 6-DOF SLAM with stereo-in-hand. T-RO 24, 946–957 (2008)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: ICCV (2011)
Straub, J., Bhandari, N., Leonard, J.J., Fisher, J.W.: Real-time Manhattan world rotation estimation in 3D. In: IROS (2015)
Tardif, J.P., Pavlidis, Y., Daniilidis, K.: Monocular visual odometry in urban environments using an omnidirectional camera. In: IROS (2008)
Ummenhofer, B., et al.: DeMoN: depth and motion network for learning monocular stereo. In: CVPR (2017)
Wang, S., Clark, R., Wen, H., Trigoni, N.: DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: ICRA (2017)
Wang, S., Clark, R., Wen, H., Trigoni, N.: End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks. IJRR 37, 513–542 (2017)
Xingjian, S., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: NIPS (2015)
Yin, Z., Shi, J.: GeoNet: unsupervised learning of dense depth, optical flow and camera pose. In: CVPR (2018)
Zamir, A.R., Sax, A., Shen, W., Guibas, L., Malik, J., Savarese, S.: Taskonomy: disentangling task transfer learning. In: CVPR (2018)
Zhan, H., Garg, R., Saroj Weerasekera, C., Li, K., Agarwal, H., Reid, I.: Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: CVPR (2018)
Zhang, H., et al.: Context encoding for semantic segmentation. In: CVPR (2018)
Zhang, J., Kaess, M., Singh, S.: Real-time depth enhanced monocular odometry. In: IROS (2014)
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: CVPR (2017)
Zhou, Y., Kneip, L., Rodriguez, C., Li, H.: Divide and conquer: efficient density-based tracking of 3D sensors in Manhattan worlds. In: ACCV (2016)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Xue, F., Wang, Q., Wang, X., Dong, W., Wang, J., Zha, H. (2019). Guided Feature Selection for Deep Visual Odometry. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11366. Springer, Cham. https://doi.org/10.1007/978-3-030-20876-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-20876-9_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20875-2
Online ISBN: 978-3-030-20876-9
eBook Packages: Computer ScienceComputer Science (R0)