GEN-SLAM: Generative Modeling for Monocular Simultaneous Localization and Mapping
Pages 147 - 153
Abstract
We present a Deep Learning based system for the twin tasks of localization and obstacle avoidance essential to any mobile robot. Our system learns from conventional geometric SLAM, and outputs, using a single camera, the topological pose of the camera in an environment, and the depth map of obstacles around it. We use a CNN to localize in a topological map, and a conditional VAE to output depth for a camera image, conditional on this topological location estimation. We demonstrate the effectiveness of our monocular localization and depth estimation system on simulated and real datasets.
References
[1]
D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
[2]
I. Ulrich and I. Nourbakhsh, “Appearance-based place recognition for topological localization,” in Robotics and Automation, 2000. Proceedings. ICRA’00. IEEE International Conference on, vol. 2. Ieee, 2000, pp. 1023–1029.
[3]
J. J. Leonard and H. F. Durrant-Whyte, “Mobile robot localization by tracking geometric beacons,” IEEE Transactions on robotics and Automation, vol. 7, no. 3, pp. 376–382, 1991.
[4]
R. Smith, M. Self, and P. Cheeseman, “Estimating uncertain spatial relationships in robotics,” in Machine intelligence and pattern recognition. Elsevier, 1988, vol. 5, pp. 435–461.
[5]
R. A. Newcombe, S. J. Lovegrove, and A. J. Davison, “Dtam: Dense tracking and mapping in real-time,” in Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011, pp. 2320–2327.
[6]
J. Engel, T. Schops, and D. Cremers, “Lsd-slam: Large-scale direct monocular slam,” in European Conference on Computer Vision. Springer, 2014, pp. 834–849.
[7]
R. Mur-Artal and J. D. Tardos, “Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras,” IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255–1262, 2017.
[8]
D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” in Advances in neural information processing systems, 2014, pp. 2366–2374.
[9]
R. Garg, V. K. BG, G. Carneiro, and I. Reid, “Unsupervised cnn for single view depth estimation: Geometry to the rescue,” in European Conference on Computer Vision. Springer, 2016, pp. 740–756.
[10]
T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised learning of depth and ego-motion from video,” in CVPR, vol. 2, no. 6, 2017, p. 7.
[11]
C. Godard, O. Mac Aodha, and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” in CVPR, vol. 2, no. 6, 2017, p. 7.
[12]
B. Ummenhofer, H. Zhou, J. Uhrig, N. Mayer, E. Ilg, A. Dosovitskiy, and T. Brox, “Demon: Depth and motion network for learning monocular stereo,” in IEEE Conference on computer vision and pattern recognition (CVPR), vol. 5, 2017, p. 6.
[13]
P. Chakravarty, K. Kelchtermans, T. Roussel, S. Wellens, T. Tuytelaars, and L. Van Eycken, “Cnn-based single image obstacle avoidance on a quadrotor,” in Robotics and Automation (ICRA), 2017 IEEE International Conference on. IEEE, 2017, pp. 6369–6374.
[14]
R. Li, S. Wang, Z. Long, and D. Gu, “Undeepvo: Monocular visual odometry through unsupervised deep learning,” arXiv preprint arXiv:1709.06841, 2017.
[15]
F. Ma and S. Karaman, “Sparse-to-dense: depth prediction from sparse depth samples and a single image,” arXiv preprint arXiv:1709.07492, 2017.
[16]
J. Walker, C. Doersch, A. Gupta, and H. Martial, “An uncertain future: Forecasting from static images using variational autoencoders,” arXiv preprint arXiv:1606.07873, 2016.
[17]
A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” 2015.
[18]
K. Gregor, I. Danihelka, A. Graves, D. J. Rezende, and D. Wierstra, “Draw, a recurrent neural network for image generation,” arXiv preprint arXiv:1502.04623, 2015.
[19]
S. M. Eslami, Ali, J. Rezende, F. Besse, F. Viola, A. S. Morcos, M. Garnelo, A. Ruderman, A. A. Rusu, I. Denihelka, K. Gregor, D. P. Reichert, L. Buesing, T. Weber, O. Vinyals, D. Rosenbaum, N. Rabinowitz, H. King, C. Hillier, M. Botvinick, D. Wierstra, K. Kavukcuoglu, and D. Hassabis, “Neural scene representation and rendering,” Science, vol. 360, no. 6394, pp. 1204–1210, 2018.
[20]
P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” 2016.
[21]
J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” 2017.
[22]
M.-Y. Liu, T. Breuel, and J. Kautz, “Unsupervised image-to-image translation networks,” in Advances in Neural Information Processing Systems, 2017, pp. 700–708.
[23]
X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” arXiv preprint arXiv:1804.04732, 2018.
[24]
D. Ha and D. Eck, “A neural representation of sketch drawings,” arXiv preprint arXiv:1704.03477, 2017.
[25]
R. Bowman, Samuel, L. Vilnis, O. Vinyals, A. M. Dai, R. Josefowics, and S. Bengio, “Generating sentences from a continious space,” arXiv preprint arXiv:1511.06349, 2015.
[26]
S. Semeniuta, A. Severyn, and E. Barth, “A hybrid convolutional variational autoencoder for text generation,” arXiv preprint arXiv:1702.02390, 2017.
[27]
Y. Zhang, Z. Gan, K. Fan, Z. Chen, R. Henao, D. Shen, and L. Carin, “Adversarial feature matching for text generation.”.
[28]
C.-C. Hsu, H.-T. Hwang, Y.-C. Wu, Y. Tsao, and H.-M. Wang, “Voice conversion from non-parallel corpora using variational auto-encoder,” arXiv preprint arXiv:1610.04019, 2016.
[29]
W.-N. Hsu, Y. Zhang, and J. Glass, “Unsupervised learning of disentangled and interpretable representations from sequential data,” arXiv preprint arXiv:1709.07902, 2017.
[30]
A. Van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” arXiv preprint arXiv:1711.00937, 2017.
[31]
D. Ha and J. Schmidhuber, “World models,” arXiv preprint arXiv:1803.10122, 2018.
[32]
I. Higgins, A. Pal, A. A. Rusu, L. Matthey, C. P. Burgess, A. Pritzel, M. Botvinick, C. Blundell, and A. Lerchner, “Darla: Improving zero-shot transfer in reinforcement learning,” arXiv preprint arXiv:1707.08475, 2017.
[33]
D. P. Rezende and S. Mohamed, “Stochastic backpropagation and approximate inference in deep generative models,” arXiv preprint arXiv:1401.4082, 2014.
[34]
A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther, “Autoencoding beyond pixels using a learned similarity metric,” arXiv preprint arXiv:1512.09300, 2015.
[35]
C. Wan, T. Probst, L. Van Gool, and A. Yao, “Crossing nets: Combining gans and vaes with a shared latent space for hand pose estimation,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.
[36]
“Unreal engine 4,” https://www.unrealengine.com/en-US/what-is-unreal-engine-4
[37]
W. Qiu, F. Zhong, Y. Zhang, S. Qiao, Z. Xiao, T. S. Kim, Y. Wang, and A. Yuille, “Unrealcv: Virtual worlds for computer vision,” ACM Multimedia Open Source Software Competition, 2017.
[38]
“Stereo zed camera,” https://www.stereolabs.com/
[39]
“Orbslam 2 code,” https://github.com/raulmur/ORB_SLAM2
[40]
A. Levin, D. Lischinski, and Y. Weiss, “Colorization using optimization,” in ACM transactions on graphics (tog), vol. 23, no. 3. ACM, 2004, pp. 689–694.
Recommendations
MonoRGBD-SLAM: Simultaneous localization and mapping using both monocular and RGBD cameras
2017 IEEE International Conference on Robotics and Automation (ICRA)RGBD SLAM systems have shown impressive results, but the limited field of view (FOV) and depth range of typical RGBD cameras still cause problems for registering distant frames. Monocular SLAM systems, in contrast, can exploit wide-angle cameras and do ...
Global Localization from Monocular SLAM on a Mobile Phone
We propose the combination of a keyframe-based monocular SLAM system and a global localization method. The SLAM system runs locally on a camera-equipped mobile client and provides continuous, relative 6DoF pose estimation as well as keyframe images with ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
May 2019
7095 pages
Copyright © 2019.
Publisher
IEEE Press
Publication History
Published: 20 May 2019
Qualifiers
- Research-article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 26 Dec 2024