More Web Proxy on the site http://driver.im/

research-article

GEN-SLAM: Generative Modeling for Monocular Simultaneous Localization and Mapping

Authors:

Punarjay Chakravarty,

Praveen Narayanan,

Tom RousselAuthors Info & Claims

2019 International Conference on Robotics and Automation (ICRA)

Pages 147 - 153

https://doi.org/10.1109/ICRA.2019.8793530

Published: 20 May 2019 Publication History

Abstract

We present a Deep Learning based system for the twin tasks of localization and obstacle avoidance essential to any mobile robot. Our system learns from conventional geometric SLAM, and outputs, using a single camera, the topological pose of the camera in an environment, and the depth map of obstacles around it. We use a CNN to localize in a topological map, and a conditional VAE to output depth for a camera image, conditional on this topological location estimation. We demonstrate the effectiveness of our monocular localization and depth estimation system on simulated and real datasets.

References

[1]

D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.

[2]

I. Ulrich and I. Nourbakhsh, “Appearance-based place recognition for topological localization,” in Robotics and Automation, 2000. Proceedings. ICRA’00. IEEE International Conference on, vol. 2. Ieee, 2000, pp. 1023–1029.

[3]

J. J. Leonard and H. F. Durrant-Whyte, “Mobile robot localization by tracking geometric beacons,” IEEE Transactions on robotics and Automation, vol. 7, no. 3, pp. 376–382, 1991.

[4]

R. Smith, M. Self, and P. Cheeseman, “Estimating uncertain spatial relationships in robotics,” in Machine intelligence and pattern recognition. Elsevier, 1988, vol. 5, pp. 435–461.

[5]

R. A. Newcombe, S. J. Lovegrove, and A. J. Davison, “Dtam: Dense tracking and mapping in real-time,” in Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011, pp. 2320–2327.

[6]

J. Engel, T. Schops, and D. Cremers, “Lsd-slam: Large-scale direct monocular slam,” in European Conference on Computer Vision. Springer, 2014, pp. 834–849.

[7]

R. Mur-Artal and J. D. Tardos, “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,” IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255–1262, 2017.

Digital Library

[8]

D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” in Advances in neural information processing systems, 2014, pp. 2366–2374.

[9]

R. Garg, V. K. BG, G. Carneiro, and I. Reid, “Unsupervised cnn for single view depth estimation: Geometry to the rescue,” in European Conference on Computer Vision. Springer, 2016, pp. 740–756.

[10]

T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised learning of depth and ego-motion from video,” in CVPR, vol. 2, no. 6, 2017, p. 7.

[11]

C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” in CVPR, vol. 2, no. 6, 2017, p. 7.

[12]

B. Ummenhofer, H. Zhou, J. Uhrig, N. Mayer, E. Ilg, A. Dosovitskiy, and T. Brox, “Demon: Depth and motion network for learning monocular stereo,” in IEEE Conference on computer vision and pattern recognition (CVPR), vol. 5, 2017, p. 6.

[13]

P. Chakravarty, K. Kelchtermans, T. Roussel, S. Wellens, T. Tuytelaars, and L. Van Eycken, “Cnn-based single image obstacle avoidance on a quadrotor,” in Robotics and Automation (ICRA), 2017 IEEE International Conference on. IEEE, 2017, pp. 6369–6374.

[14]

R. Li, S. Wang, Z. Long, and D. Gu, “Undeepvo: Monocular visual odometry through unsupervised deep learning,” arXiv preprint arXiv:1709.06841, 2017.

[15]

F. Ma and S. Karaman, “Sparse-to-dense: depth prediction from sparse depth samples and a single image,” arXiv preprint arXiv:1709.07492, 2017.

[16]

J. Walker, C. Doersch, A. Gupta, and H. Martial, “An uncertain future: Forecasting from static images using variational autoencoders,” arXiv preprint arXiv:1606.07873, 2016.

[17]

A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” 2015.

[18]

K. Gregor, I. Danihelka, A. Graves, D. J. Rezende, and D. Wierstra, “Draw, a recurrent neural network for image generation,” arXiv preprint arXiv:1502.04623, 2015.

[19]

S. M. Eslami, Ali, J. Rezende, F. Besse, F. Viola, A. S. Morcos, M. Garnelo, A. Ruderman, A. A. Rusu, I. Denihelka, K. Gregor, D. P. Reichert, L. Buesing, T. Weber, O. Vinyals, D. Rosenbaum, N. Rabinowitz, H. King, C. Hillier, M. Botvinick, D. Wierstra, K. Kavukcuoglu, and D. Hassabis, “Neural scene representation and rendering,” Science, vol. 360, no. 6394, pp. 1204–1210, 2018.

[20]

P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” 2016.

[21]

J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” 2017.

[22]

M.-Y. Liu, T. Breuel, and J. Kautz, “Unsupervised image-to-image translation networks,” in Advances in Neural Information Processing Systems, 2017, pp. 700–708.

[23]

X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” arXiv preprint arXiv:1804.04732, 2018.

[24]

D. Ha and D. Eck, “A neural representation of sketch drawings,” arXiv preprint arXiv:1704.03477, 2017.

[25]

R. Bowman, Samuel, L. Vilnis, O. Vinyals, A. M. Dai, R. Josefowics, and S. Bengio, “Generating sentences from a continious space,” arXiv preprint arXiv:1511.06349, 2015.

[26]

S. Semeniuta, A. Severyn, and E. Barth, “A hybrid convolutional variational autoencoder for text generation,” arXiv preprint arXiv:1702.02390, 2017.

[27]

Y. Zhang, Z. Gan, K. Fan, Z. Chen, R. Henao, D. Shen, and L. Carin, “Adversarial feature matching for text generation.”.

[28]

C.-C. Hsu, H.-T. Hwang, Y.-C. Wu, Y. Tsao, and H.-M. Wang, “Voice conversion from non-parallel corpora using variational auto-encoder,” arXiv preprint arXiv:1610.04019, 2016.

[29]

W.-N. Hsu, Y. Zhang, and J. Glass, “Unsupervised learning of disentangled and interpretable representations from sequential data,” arXiv preprint arXiv:1709.07902, 2017.

[30]

A. Van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” arXiv preprint arXiv:1711.00937, 2017.

[31]

D. Ha and J. Schmidhuber, “World models,” arXiv preprint arXiv:1803.10122, 2018.

[32]

I. Higgins, A. Pal, A. A. Rusu, L. Matthey, C. P. Burgess, A. Pritzel, M. Botvinick, C. Blundell, and A. Lerchner, “Darla: Improving zero-shot transfer in reinforcement learning,” arXiv preprint arXiv:1707.08475, 2017.

[33]

D. P. Rezende and S. Mohamed, “Stochastic backpropagation and approximate inference in deep generative models,” arXiv preprint arXiv:1401.4082, 2014.

[34]

A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther, “Autoencoding beyond pixels using a learned similarity metric,” arXiv preprint arXiv:1512.09300, 2015.

[35]

C. Wan, T. Probst, L. Van Gool, and A. Yao, “Crossing nets: Combining gans and vaes with a shared latent space for hand pose estimation,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.

[36]

“Unreal engine 4,” https://www.unrealengine.com/en-US/what-is-unreal-engine-4

[37]

W. Qiu, F. Zhong, Y. Zhang, S. Qiao, Z. Xiao, T. S. Kim, Y. Wang, and A. Yuille, “Unrealcv: Virtual worlds for computer vision,” ACM Multimedia Open Source Software Competition, 2017.

[38]

“Stereo zed camera,” https://www.stereolabs.com/

[39]

“Orbslam 2 code,” https://github.com/raulmur/ORB_SLAM2

[40]

A. Levin, D. Lischinski, and Y. Weiss, “Colorization using optimization,” in ACM transactions on graphics (tog), vol. 23, no. 3. ACM, 2004, pp. 689–694.

Digital Library

Recommendations

MonoRGBD-SLAM: Simultaneous localization and mapping using both monocular and RGBD cameras
2017 IEEE International Conference on Robotics and Automation (ICRA)
RGBD SLAM systems have shown impressive results, but the limited field of view (FOV) and depth range of typical RGBD cameras still cause problems for registering distant frames. Monocular SLAM systems, in contrast, can exploit wide-angle cameras and do ...
Multi-camera simultaneous localization and mapping
Global Localization from Monocular SLAM on a Mobile Phone

We propose the combination of a keyframe-based monocular SLAM system and a global localization method. The SLAM system runs locally on a camera-equipped mobile client and provides continuous, relative 6DoF pose estimation as well as keyframe images with ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

2019 International Conference on Robotics and Automation (ICRA)

May 2019

7095 pages

Copyright © 2019.

Publisher

IEEE Press

Publication History

Published: 20 May 2019

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten