Improved Bilinear Pooling for Real-Time Pose Event Camera Relocalisation

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14233))

Included in the following conference series:

International Conference on Image Analysis and Processing

820 Accesses

Abstract

Traditional methods for estimating camera pose have been replaced by more advanced camera relocalization methods that utilize both CNNs and LSTMs in the field of simultaneous localization and mapping. However, the reliance on LSTM layers in these methods can lead to overfitting and slow convergence. In this paper, a novel approach for estimating the six degree of freedom (6DOF) pose of an event camera using deep learning is presented. Our method begins by preprocessing the events captured by the event camera to generate a set of images. These images are then passed through two CNNs to extract relevant features. These features are multiplied using an outer product and aggregated across different regions of the image after adding L2 normalization to normalize the combining vector. The final step of the model is a regression layer that predicts the position and orientation of the event camera. The effectiveness of this approach has been tested on various datasets, and the results demonstrate its superiority compared to existing state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

3D Human Pose Estimation Based on Multi-Input Multi-Output Convolutional Neural Network and Event Cameras: A Proof of Concept on the DHP19 Dataset

Relative Camera Pose Estimation Using Convolutional Neural Networks

Learning Neural Volumetric Pose Features for Camera Localization

References

Badrinarayanan, Vijay, Kendall, Alex, Cipolla, Roberto: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). arXiv preprint arXiv:1511.07289 (2015)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Eitel, A., Springenberg, J.T., Spinello, L., Riedmiller, M., Burgard, W.: Multimodal deep learning for robust RGB-D object recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 681–687. IEEE (2015)
Google Scholar
Gallego, Guillermo, Scaramuzza, Davide: Accurate angular velocity estimation with an event camera. IEEE Robot. Autom. Lett. 2(2), 632–639 (2017)
Article Google Scholar
Kendall, A., Cipolla, R.: Modelling uncertainty in deep learning for camera relocalization. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 4762–4769. IEEE (2016)
Google Scholar
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2938–2946 (2015)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lepetit, Vincent, Moreno-Noguer, Francesc, Fua, Pascal: EPnP: an accurate o(n) solution to the PnP problem. Int. J. Comput. Vis. 81(2), 155–166 (2009)
Article Google Scholar
Li, Ming, Chen, Ruizhi, Liao, Xuan, Guo, Bingxuan, Zhang, Weilong, Guo, Ge.: A precise indoor visual positioning approach using a built image feature database and single user image from smartphone cameras. Remote Sens. 12(5), 869 (2020)
Article Google Scholar
Lin, T.-Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1449–1457 (2015)
Google Scholar
Mahajan, D., et al.: Exploring the limits of weakly supervised pretraining. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 181–196 (2018)
Google Scholar
Mueggler, Elias, Rebecq, Henri, Gallego, Guillermo, Delbruck, Tobi, Scaramuzza, Davide: The event-camera dataset and simulator: event-based data for pose estimation, visual odometry, and slam. The Int. J. Robot. Res. 36(2), 142–149 (2017)
Article Google Scholar
Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: an open-source slam system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33(5), 1255–1262 (2017)
Google Scholar
Nguyen, A., Do, T.-T., Caldwell, D.G., Tsagarakis, N.G.: Real-time 6DOF pose relocalization for event cameras with stacked spatial LSTM networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0 (2019)
Google Scholar
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8024–8035 (2019)
Google Scholar
Qu, C., Shivakumar, S.S., Miller, I.D., Taylor, C.J.: DSOL: A fast direct sparse odometry scheme. arXiv preprint arXiv:2203.08182 (2022)
Rebecq, H., Horstschaefer, T., Scaramuzza, D.: Real-time visual-inertial odometry for event cameras using keyframe-based nonlinear optimization (2017)
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: MobileNetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

Download references

Author information

Authors and Affiliations

IBISC, University Evry, Universite Paris-Saclay, 91025, Evry, France
Ahmed Tabia, Fabien Bonardi & Samia Bouchafa

Authors

Ahmed Tabia
View author publications
You can also search for this author in PubMed Google Scholar
Fabien Bonardi
View author publications
You can also search for this author in PubMed Google Scholar
Samia Bouchafa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Fabien Bonardi or Samia Bouchafa .

Editor information

Editors and Affiliations

University of Udine, Udine, Italy
Gian Luca Foresti
University of Udine, Udine, Italy
Andrea Fusiello
University of York, York, UK
Edwin Hancock

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tabia, A., Bonardi, F., Bouchafa, S. (2023). Improved Bilinear Pooling for Real-Time Pose Event Camera Relocalisation. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing – ICIAP 2023. ICIAP 2023. Lecture Notes in Computer Science, vol 14233. Springer, Cham. https://doi.org/10.1007/978-3-031-43148-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-43148-7_19
Published: 05 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43147-0
Online ISBN: 978-3-031-43148-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics