Abstract
Traditional methods for estimating camera pose have been replaced by more advanced camera relocalization methods that utilize both CNNs and LSTMs in the field of simultaneous localization and mapping. However, the reliance on LSTM layers in these methods can lead to overfitting and slow convergence. In this paper, a novel approach for estimating the six degree of freedom (6DOF) pose of an event camera using deep learning is presented. Our method begins by preprocessing the events captured by the event camera to generate a set of images. These images are then passed through two CNNs to extract relevant features. These features are multiplied using an outer product and aggregated across different regions of the image after adding L2 normalization to normalize the combining vector. The final step of the model is a regression layer that predicts the position and orientation of the event camera. The effectiveness of this approach has been tested on various datasets, and the results demonstrate its superiority compared to existing state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Badrinarayanan, Vijay, Kendall, Alex, Cipolla, Roberto: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). arXiv preprint arXiv:1511.07289 (2015)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust RGB-D object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 681–687. IEEE (2015)
Gallego, Guillermo, Scaramuzza, Davide: Accurate angular velocity estimation with an event camera. IEEE Robot. Autom. Lett. 2(2), 632–639 (2017)
Kendall, A., Cipolla, R.: Modelling uncertainty in deep learning for camera relocalization. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 4762–4769. IEEE (2016)
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2938–2946 (2015)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lepetit, Vincent, Moreno-Noguer, Francesc, Fua, Pascal: EPnP: an accurate o(n) solution to the PnP problem. Int. J. Comput. Vis. 81(2), 155–166 (2009)
Li, Ming, Chen, Ruizhi, Liao, Xuan, Guo, Bingxuan, Zhang, Weilong, Guo, Ge.: A precise indoor visual positioning approach using a built image feature database and single user image from smartphone cameras. Remote Sens. 12(5), 869 (2020)
Lin, T.-Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1449–1457 (2015)
Mahajan, D., et al.: Exploring the limits of weakly supervised pretraining. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 181–196 (2018)
Mueggler, Elias, Rebecq, Henri, Gallego, Guillermo, Delbruck, Tobi, Scaramuzza, Davide: The event-camera dataset and simulator: event-based data for pose estimation, visual odometry, and slam. The Int. J. Robot. Res. 36(2), 142–149 (2017)
Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: an open-source slam system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33(5), 1255–1262 (2017)
Nguyen, A., Do, T.-T., Caldwell, D.G., Tsagarakis, N.G.: Real-time 6DOF pose relocalization for event cameras with stacked spatial LSTM networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0 (2019)
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8024–8035 (2019)
Qu, C., Shivakumar, S.S., Miller, I.D., Taylor, C.J.: DSOL: A fast direct sparse odometry scheme. arXiv preprint arXiv:2203.08182 (2022)
Rebecq, H., Horstschaefer, T., Scaramuzza, D.: Real-time visual-inertial odometry for event cameras using keyframe-based nonlinear optimization (2017)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: MobileNetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tabia, A., Bonardi, F., Bouchafa, S. (2023). Improved Bilinear Pooling for Real-Time Pose Event Camera Relocalisation. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing – ICIAP 2023. ICIAP 2023. Lecture Notes in Computer Science, vol 14233. Springer, Cham. https://doi.org/10.1007/978-3-031-43148-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-031-43148-7_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43147-0
Online ISBN: 978-3-031-43148-7
eBook Packages: Computer ScienceComputer Science (R0)