Optimizing Appearance-Based Localization with Catadioptric Cameras: Small-Footprint Models for Real-Time Inference on Edge Devices
<p>Overview of the proposed method—a flowchart of the appearance-based localization system. The service robot is shown with the latest, larger-field-of-view catadioptric camera, but without the perspective camera, which is not used in this research.</p> "> Figure 2
<p>Diagram of the CNN-based image description blocks that produce embeddings used as global descriptors in the localization system. The global map is built from <math display="inline"><semantics><msub><mi>n</mi><mi>i</mi></msub></semantics></math> images (<math display="inline"><semantics><mrow><msub><mi>i</mi><mn>1</mn></msub><mo>…</mo><msub><mi>i</mi><msub><mi>n</mi><mi>i</mi></msub></msub></mrow></semantics></math>) converted to embedding vectors <math display="inline"><semantics><msub><mover accent="true"><mi mathvariant="bold">d</mi><mo>→</mo></mover><mi>i</mi></msub></semantics></math> that are stored in the map <math display="inline"><semantics><msub><mi mathvariant="bold">D</mi><mi>embeddings</mi></msub></semantics></math> of <math display="inline"><semantics><msub><mi>n</mi><mi>i</mi></msub></semantics></math> embeddings (global descriptors). Note that panoramic images can be used as well instead of the omnidirectional ones.</p> "> Figure 3
<p>Labbot mobile robot with the integrated sensor with a catadioptric camera (<b>a</b>); robot paths during image collection—different colours indicate different paths (<b>b</b>).</p> "> Figure 4
<p>Omnidirectional images of different locations (<b>a</b>,<b>b</b>) in the Mechatronics Centre and an example image after masking (<b>c</b>).</p> "> Figure 5
<p>Blueprint of the first floor (<b>a</b>) and third floor (<b>b</b>) of the Mechatronics Centre building, with marked places (blue crosses) where images were taken.</p> "> Figure 6
<p>Example of omnidirectional image augmentation: (<b>a</b>)—original picture; (<b>b</b>,<b>c</b>)—augmented images.</p> "> Figure 7
<p>Maps of the two parts of the laboratory in Freiburg with approximate paths followed by the robot during data acquisition (map and trajectories data adopted from the COLD dataset web page <a href="https://www.cas.kth.se/COLD/cold-freiburg.html" target="_blank">https://www.cas.kth.se/COLD/cold-freiburg.html</a>).</p> "> Figure 8
<p>Example images from the COLD Freiburg dataset: (<b>a</b>) one-person office (1PO-A); (<b>b</b>) kitchen (KT-A); (<b>c</b>) stairs area (ST-A); (<b>d</b>) printer area (PA-A).</p> "> Figure 9
<p>Model training results in Experiment 1.</p> "> Figure 10
<p>Confusion matrix for 17 sections.</p> "> Figure 11
<p>Results of sample section predictions. The image in the first column is a query; the other columns are the four closest neighbours. In square brackets, there is the section number (i.e. [12], [02]), and next to it, the L2 distances between the query and the presented image are given. An example of (<b>a</b>) correct place recognition and (<b>b</b>) mismatched sections having slightly overlapping ranges.</p> "> Figure 12
<p>Quantitative results for Configuration A: (<b>a</b>)—a percentage of matches that are within a range of distance from the actual distance (the units on the x-axis are the ranges of distances); (<b>b</b>)—average distance measurement error.</p> "> Figure 13
<p>Quantitative results for Configuration B. (<b>a</b>)—the percentage of matches that are within a range of distance from the actual distance (the units on the x-axis are the ranges of distances); (<b>b</b>)—the average distance measurement error.</p> "> Figure 14
<p>Quantitative results for Configuration C: (<b>a</b>)—the percentage of matches that are within a range of distance from the actual distance (the units on the x-axis are the ranges of distances); (<b>b</b>)—the average distance measurement error.</p> "> Figure 15
<p>Success ratio for EfficientNetV2L and the set of embeddings acquired from the training set for the room search task for the COLD Freiburg dataset. The result obtained under cloudy (blue), night (black), and sunny (yellow) conditions for the model learned on the training set, namely, a set of images on cloudy days (<b>a</b>), a set of images on cloudy days extended by missing acquisition locations found in images for sunny days and night (<b>b</b>), and a balanced set of images obtained on cloudy and sunny days and at night (<b>c</b>). Average location error in meters for a set of images on cloudy days (<b>d</b>), a set of images on cloudy days extended by missing acquisition locations found in images for sunny days and at night (<b>e</b>), and a balanced set of images obtained on cloudy and sunny days and at night (<b>f</b>).</p> "> Figure 16
<p>Success ratio for EfficientNetV2L and the set of embeddings acquired from the training set for the room search task for COLD Saarbrücken dataset for part B. Results obtained under cloudy (blue), night (black), and sunny (yellow) conditions for the model learned on the training set are a balanced set of images obtained on cloudy and sunny days and night (<b>a</b>). Average location error in meters for a balanced set of images obtained on cloudy and sunny days and night (<b>b</b>).</p> ">
Abstract
:1. Introduction
- Experimental analysis of neural network architectures in search of an architecture for an image-based place recognition system suitable for implementation on an embedded computer of an intelligent vision sensor with limited power and resources.
- Experimental verification of the possibility of using catadioptric camera images in the appearance-based localization task without developing them into panoramic form significantly reduces the computational load.
- Analysis of the strategy for creating training sets in a place recognition task, assuming that the obtained solution should be generalized to different image acquisition conditions, mainly depending on illumination.
- A novel, simple-yet-efficient CNN-based architecture of the appearance-based localization system that leverages a lightweight CNN backbone trained to apply transfer learning to produce the embeddings and the K-nearest neighbours method for quickly finding an embedding matching the current perception.
- A thorough experimental investigation of this architecture, considering several backbone network candidates and omnidirectional or panoramic images used to produce the embeddings. The experiments were conducted on three different datasets: two collected with variants of our bioinspired sensor and one publicly available.
- An investigation of the strategies for creating the training set and the reference map for the localization system conducted on the COLD Freiburg dataset. This part of our research allowed us to test how our neural network model generalizes to images acquired under different lighting/weather conditions. It resulted in the recommendation of using data balanced concerning their acquisition parameters, improving generalization.
2. Related Work
3. Localization System Architecture
4. Experiments
4.1. Experiment 1: Integrated Sensor on a Mobile Robot
- Raw catadioptric images were used (cf. Figure 4) without converting them to panoramic images.
- The neural network used to produce the embeddings was EfficientNet, which was selected upon literature-based analysis.
4.2. Experiment 2: Stand-Alone Catadioptric Camera
- Configuration A—the entire dataset was divided into a training set (), a validation set (), and a test set () for each place. The validation set was then used as the reference database of embeddings.
- Configuration B—the entire dataset was divided into a training set (), a validation set (), and a test set () in such a way that the locations next to the places represented in the test set were always represented in the map of embeddings. The global map of embeddings was created from a combination of the training and the validation set, but the places from the test set, used then as queries, were not directly represented in the map.
- Configuration C—all images of the places located on the first floor were divided into a training set () and a validation set (). The set of images recorded on the third floor was used to test the proposed solution. The 106 places for which images were recorded on the third floor were divided into the database of embeddings () and a test set used as queries (), in such a way that the locations next to the places included in the test set were represented in the map of embeddings.
4.3. Experiment 3: COLD Datasets
5. Results and Discussion
5.1. Experiment 1
5.2. Experiment 2
5.3. Experiment 3
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Lee, I. Service Robots: A Systematic Literature Review. Electronics 2021, 10, 2658. [Google Scholar] [CrossRef]
- Zachiotis, G.A.; Andrikopoulos, G.; Gornez, R.; Nakamura, K.; Nikolakopoulos, G. A Survey on the Application Trends of Home Service Robotics. In Proceedings of the IEEE International Conference on Robotics and Biomimetics (ROBIO), Kuala Lumpur, Malaysia, 12–15 December 2018; pp. 1999–2006. [Google Scholar]
- Asgharian, P.; Panchea, A.M.; Ferland, F. A Review on the Use of Mobile Service Robots in Elderly Care. Robotics 2022, 11, 127. [Google Scholar] [CrossRef]
- Skrzypczyński, P.; Tobis, S. Eldercare Robots in the Age of AI: Are We Ready to Address the User Needs? In Proceedings of the 3rd Polish Conference on Artificial Intelligence PP-RAI’2022, Gdynia, Poland, 25–27 April 2022; pp. 116–121. [Google Scholar]
- Huang, J.; Junginger, S.; Liu, H.; Thurow, K. Indoor Positioning Systems of Mobile Robots: A Review. Robotics 2023, 12, 47. [Google Scholar] [CrossRef]
- Sousa, R.B.; Sobreira, H.M.; Moreira, A.P. A systematic literature review on long-term localization and mapping for mobile robots. J. Field Robot. 2023, 40, 1245–1322. [Google Scholar] [CrossRef]
- Wietrzykowski, J.; Skrzypczyński, P. PlaneLoc: Probabilistic global localization in 3-D using local planar features. Robot. Auton. Syst. 2019, 113, 160–173. [Google Scholar] [CrossRef]
- Rostkowska, M.; Skrzypczyński, P. Hybrid field of view vision: From biological inspirations to integrated sensor design. In Proceedings of the IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), Baden-Baden, Germany, 19–21 September 2016; pp. 629–634. [Google Scholar]
- Skrzypczyński, P.; Rostkowska, M.; Wasik, M. Bio-Inspired, Real-Time Passive Vision for Mobile Robots. In Machine Vision and Navigation; Springer International Publishing: Cham, Switzerland, 2020; pp. 33–58. [Google Scholar]
- Lowry, S.; Sünderhauf, N.; Newman, P.; Leonard, J.J.; Cox, D.; Corke, P.; Milford, M.J. Visual Place Recognition: A Survey. IEEE Trans. Robot. 2016, 32, 1–19. [Google Scholar] [CrossRef] [Green Version]
- Rostkowska, M.; Skrzypczyński, P. A Practical Application of QR-codes for Mobile Robot Localization in Home Environment. In Human-Centric Robotics: Proceedings of CLAWAR 2017: 20th International Conference on Climbing and Walking Robots and the Support Technologies for Mobile Machines, Porto, Portugal, 11–13 September 2018; World Scientific: Singapore, 2018; pp. 311–318. [Google Scholar]
- Arroyo, R.; Alcantarilla, P.F.; Bergasa, L.M.; Romera, E. Towards life-long visual localization using an efficient matching of binary sequences from images. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 6328–6335. [Google Scholar]
- Wang, T.; Huang, H.; Lin, J.; Hu, C.; Zeng, K.; Sun, M. Omnidirectional CNN for Visual Place Recognition and Navigation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 2341–2348. [Google Scholar]
- Yokoyama, A.M.; Ferro, M.; de Paula, F.B.; Vieira, V.G.; Schulze, B. Investigating hardware and software aspects in the energy consumption of machine learning: A green AI-centric analysis. In Concurrency and Computation: Practice and Experience; Wiley: Hoboken, NJ, USA, 2023; p. e7825. [Google Scholar]
- Süzen, A.A.; Duman, B.; Şen, B. Benchmark Analysis of Jetson TX2, Jetson Nano and Raspberry PI using Deep-CNN. In Proceedings of the International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), Ankara, Turkey, 26–27 June 2020; pp. 1–5. [Google Scholar]
- Lemaire, T.; Berger, C.; Jung, I.K.; Lacroix, S. Vision-Based SLAM: Stereo and Monocular Approaches. Int. J. Comput. Vis. 2007, 74, 343–364. [Google Scholar] [CrossRef]
- Macario Barros, A.; Michel, M.; Moline, Y.; Corre, G.; Carrel, F. A Comprehensive Survey of Visual SLAM Algorithms. Robotics 2022, 11, 24. [Google Scholar] [CrossRef]
- Labbé, M.; Michaud, F. Appearance-Based Loop Closure Detection for Online Large-Scale and Long-Term Operation. IEEE Trans. Robot. 2013, 29, 734–745. [Google Scholar] [CrossRef]
- Williams, B.; Cummins, M.; Neira, J.; Newman, P.; Reid, I.; Tardós, J. A comparison of loop closing techniques in monocular SLAM. Robot. Auton. Syst. 2009, 57, 1188–1197. [Google Scholar] [CrossRef] [Green Version]
- Ullah, M.M.; Pronobis, A.; Caputo, B.; Luo, J.; Jensfelt, P.; Christensen, H.I. Towards robust place recognition for robot localization. In Proceedings of the IEEE International Conference on Robotics and Automation, Pasadena, CA, USA, 19–23 May 2008; pp. 530–537. [Google Scholar]
- Nowicki, M.R.; Wietrzykowski, J.; Skrzypczyński, P. Real-Time Visual Place Recognition for Personal Localization on a Mobile Device. Wirel. Pers. Commun. 2017, 97, 213–244. [Google Scholar] [CrossRef] [Green Version]
- Murillo, A.C.; Guerrero, J.J.; Sagues, C. SURF features for efficient robot localization with omnidirectional images. In Proceedings of the IEEE International Conference on Robotics and Automation, Roma, Italy, 10–14 April 2007; pp. 3901–3907. [Google Scholar]
- Schmidt, A.; Kraft, M.; Fularz, M.; Domagala, Z. Comparative assessment of point feature detectors and descriptors in the context of robot navigation. J. Autom. Mob. Robot. Intell. Syst. JAMRIS 2013, 7, 11–20. [Google Scholar]
- Sivic, J.; Zisserman, A. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the Ninth IEEE International Conference on Computer Vision, Nice, France, 14–17 October 2003; Volume 2, pp. 1470–1477. [Google Scholar]
- Cummins, M.; Newman, P. FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance. Int. J. Robot. Res. 2008, 27, 647–665. [Google Scholar] [CrossRef]
- Cummins, M.; Newman, P. Appearance-only SLAM at large scale with FAB-MAP 2.0. Int. J. Robot. Res. 2010, 30, 1100–1123. [Google Scholar] [CrossRef]
- Román, V.; Payá, L.; Peidró, A.; Ballesta, M.; Reinoso, O. The Role of Global Appearance of Omnidirectional Images in Relative Distance and Orientation Retrieval. Sensors 2021, 21, 3327. [Google Scholar] [CrossRef] [PubMed]
- Menegatti, E.; Maeda, T.; Ishiguro, H. Image-based memory for robot navigation using properties of omnidirectional images. Robot. Auton. Syst. 2004, 47, 251–267. [Google Scholar] [CrossRef] [Green Version]
- Payá, L.; Reinoso, O.; Jiménez, L.; Julia, M. Estimating the position and orientation of a mobile robot with respect to a trajectory using omnidirectional imaging and global appearance. PLoS ONE 2017, 12, e0175938. [Google Scholar] [CrossRef] [Green Version]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
- Oliva, A.; Torralba, A. Chapter 2 Building the gist of a scene: The role of global image features in recognition. In Progress in Brain Research: Visual Perception; Martinez-Conde, S., Macknik, S., Martinez, L., Alonso, J.M., Tse, P., Eds.; Elsevier: Amsterdam, The Netherlands, 2006; Volume 155, pp. 23–36. [Google Scholar]
- Cebollada, S.; Payá, L.; Mayol-Cuevas, W.; Reinoso, O. Evaluation of Clustering Methods in Compression of Topological Models and Visual Place Recognition Using Global Appearance Descriptors. Appl. Sci. 2019, 9, 377. [Google Scholar] [CrossRef] [Green Version]
- Ai, H.; Cao, Z.; Zhu, J.; Bai, H.; Chen, Y.; Wang, L. Deep Learning for Omnidirectional Vision: A Survey and New Perspectives. arXiv 2022, arXiv:2205.10468. [Google Scholar]
- Li, Q.; Li, K.; You, X.; Bu, S.; Liu, Z. Place recognition based on deep feature and adaptive weighting of similarity matrix. Neurocomputing 2016, 199, 114–127. [Google Scholar] [CrossRef]
- Arandjelović, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Zhang, J.; Cao, Y.; Wu, Q. Vector of Locally and Adaptively Aggregated Descriptors for Image Feature Representation. Pattern Recognit. 2021, 116, 107952. [Google Scholar] [CrossRef]
- Jégou, H.; Douze, M.; Schmid, C.; Pérez, P. Aggregating local descriptors into a compact image representation. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 3304–3311. [Google Scholar]
- Gong, Y.; Wang, L.; Guo, R.; Lazebnik, S. Multi-scale Orderless Pooling of Deep Convolutional Activation Features; Springer International Publishing: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
- Cheng, R.; Wang, K.; Lin, S.; Hu, W.; Yang, K.; Huang, X.; Li, H.; Sun, D.; Bai, J. Panoramic Annular Localizer: Tackling the Variation Challenges of Outdoor Localization Using Panoramic Annular Images and Active Deep Descriptors. In Proceedings of the IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 920–925. [Google Scholar]
- Cebollada, S.; Payá, L.; Flores, M.; Roman, V.; Peidro, A.; Reinoso, O. A Deep Learning Tool to Solve Localization in Mobile Autonomous Robotics. In Proceedings of the 17th International Conference on Informatics in Control, Automation and Robotics, Online, 7–9 July 2020; pp. 232–241. [Google Scholar]
- Masci, J.; Migliore, D.; Bronstein, M.M.; Schmidhuber, J. Descriptor Learning for Omnidirectional Image Matching. In Registration and Recognition in Images and Videos; Springer: Berlin/Heidelberg, Germany, 2014; pp. 49–62. [Google Scholar]
- Ballesta, M.; Payá, L.; Cebollada, S.; Reinoso, O.; Murcia, F. A CNN Regression Approach to Mobile Robot Localization Using Omnidirectional Images. Appl. Sci. 2021, 11, 7521. [Google Scholar] [CrossRef]
- Mora, J.C.; Cebollada, S.; Flores, M.; Reinoso, Ó.; Payá, L. Training, Optimization and Validation of a CNN for Room Retrieval and Description of Omnidirectional Images. SN Comput. Sci. 2022, 3, 271. [Google Scholar] [CrossRef]
- Cunningham, P.; Delany, S.J. k-Nearest neighbour classifiers-A Tutorial. ACM Comput. Surv. CSUR 2021, 54, 1–25. [Google Scholar] [CrossRef]
- Kramer, O. K-Nearest Neighbors. In Dimensionality Reduction with Unsupervised Nearest Neighbors; Springer: Berlin/Heidelberg, Germany, 2013; pp. 13–23. [Google Scholar]
- Ab Wahab, M.N.; Nazir, A.; Zhen Ren, A.T.; Mohd Noor, M.H.; Akbar, M.F.; Mohamed, A.S.A. Efficientnet-Lite and Hybrid CNN-KNN Implementation for Facial Expression Recognition on Raspberry Pi. IEEE Access 2021, 9, 134065–134080. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 10–15 June 2019; Volume 97, pp. 6105–6114. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Rajani, N.; McArdle, K.; Dhillon, I.S. Parallel k nearest neighbor graph construction using tree-based data structures. In Proceedings of the 1st High Performance Graph Mining workshop, Sydney, Australia, 10 August 2015; Volume 1, pp. 3–11. [Google Scholar]
- Silpa-Anan, C.; Hartley, R. Optimised KD-trees for fast image descriptor matching. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
- Facebook AI Research. Faiss. 2022. Available online: https://github.com/facebookresearch/faiss (accessed on 17 June 2023).
- Norouzi, M.; Fleet, D.; Salakhutdinov, R. Hamming Distance Metric Learning. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–8 December 2012; Curran Associates, Inc.: Red Hook, NY, USA, 2012; Volume 25. [Google Scholar]
- Pronobis, A.; Caputo, B. COLD: COsy Localization Database. Int. J. Robot. Res. 2009, 28, 588–594. [Google Scholar] [CrossRef] [Green Version]
- Shuvo, M.M.H.; Islam, S.K.; Cheng, J.; Morshed, B.I. Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A Review. Proc. IEEE 2023, 111, 42–91. [Google Scholar] [CrossRef]
- Scaramuzza, D. Omnidirectional Vision: From Calibration to Root Motion Estimation. Ph.D. Thesis, ETH Zurich, Zürich, Switzerland, 2007. [Google Scholar]
- Baker, S.; Nayar, S.K. A Theory of Single-Viewpoint Catadioptric Image Formation. Int. J. Comput. Vis. 1999, 35, 175–196. [Google Scholar] [CrossRef]
- Kowa. 4.4-11mm F1.6 LMVZ4411 1/1.8" Lens. 2023. Available online: https://cmount.com/product/kowa-4-4-11mm-f1-6-lmvz4411-1-1-8-lens-c-mount (accessed on 17 June 2023).
- Bazin, J.C. Catadioptric Vision for Robotic Applications. Ph.D. Thesis, KAIST, Daejeon, Republic of Korea, 2019. [Google Scholar]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946. [Google Scholar]
- Tan, M.; Le, Q. EfficientNetV2: Smaller Models and Faster Training. arXiv 2021, arXiv:2104.00298. [Google Scholar]
- Nanne. pytorch-NetVlad. 2023. Available online: https://github.com/Nanne/pytorch-NetVlad (accessed on 17 June 2023).
- Johnson, J.M.; Khoshgoftaar, T.M. Survey on deep learning with class imbalance. J. Big Data 2019, 6, 27. [Google Scholar] [CrossRef] [Green Version]
Configuration A | Configuration B | Configuration C | ||||
---|---|---|---|---|---|---|
Training | Validation | Training | Validation | Training | Validation | |
Dataset | Dataset | Dataset | Dataset | Dataset | Dataset | |
omnidirectional | 994 (25,844) | 250 (6500) | 959 (24,934) | 241 (6266) | 753 (19,578) | 288 (7488) |
panoramic | 2982 (77,532) | 750 (19,500) | 2611 (67,886) | 653 (16,978) | 2259 (58,734) | 864 (22,464) |
Training Dataset 1 | Training Dataset 2 | Training Dataset 3 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
(575 Images) | (820 Images) | (1801 Images) | ||||||||||||
Cloudy | Cloudy | Sunny | Night | Cloudy | Sunny | Night | ||||||||
n.i. | % | n.i. | % | n.i. | % | n.i. | % | n.i. | % | n.i. | % | n.i. | % | |
Room | 575 | 100 | 576 | 70.2 | 139 | 17.0 | 105 | 12.8 | 573 | 31.8 | 651 | 36.2 | 577 | 32.0 |
1PO-A | 45 | 100 | 47 | 69.1 | 15 | 22.0 | 6 | 8.8 | 46 | 31.3 | 54 | 36.7 | 47 | 33.0 |
2PO1-A | 52 | 100 | 50 | 79.4 | 8 | 12.7 | 5 | 8.0 | 48 | 36.9 | 47 | 36.3 | 35 | 26.9 |
2PO2-A | 33 | 100 | 30 | 58.8 | 8 | 15.7 | 13 | 25.5 | 34 | 30.4 | 40 | 35.7 | 38 | 33.9 |
CR-A | 248 | 100 | 249 | 76.9 | 43 | 13.3 | 32 | 9.9 | 247 | 33.2 | 267 | 35.9 | 229 | 30.8 |
KT-A | 43 | 100 | 41 | 42.3 | 31 | 32.0 | 25 | 25.8 | 40 | 19.9 | 79 | 39.3 | 82 | 40.8 |
LO-A | 32 | 100 | 31 | 62.0 | 12 | 24.0 | 7 | 14.0 | 34 | 33.7 | 35 | 34.7 | 32 | 31.7 |
PA-A | 58 | 100 | 58 | 82.9 | 8 | 11.4 | 4 | 5.7 | 58 | 37.2 | 55 | 35.3 | 43 | 27.7 |
ST-A | 31 | 100 | 33 | 76.7 | 5 | 11.6 | 5 | 11.6 | 31 | 31.3 | 36 | 36.4 | 32 | 32.3 |
TL-A | 33 | 100 | 37 | 68.5 | 9 | 16.7 | 8 | 14.8 | 35 | 31.3 | 38 | 33.9 | 39 | 34.9 |
Training Dataset 1 | Training Dataset 2 | Training Dataset 3 | ||||
---|---|---|---|---|---|---|
(575 Images) | (820 Images) | (1801 Images) | ||||
Room. | n.i. | % | n.i. | % | n.i. | % |
1PO-A | 45 | 7.83 | 68 | 8.29 | 147 | 8.16 |
2PO1-A | 52 | 9.04 | 63 | 7.68 | 130 | 7.22 |
2PO2-A | 33 | 5.74 | 51 | 6.21 | 112 | 6.22 |
CR-A | 248 | 43.13 | 324 | 39.51 | 743 | 41.25 |
KT-A | 43 | 7.48 | 97 | 11.83 | 201 | 11.16 |
LO-A | 32 | 5.57 | 50 | 6.1 | 101 | 5.61 |
PA-A | 58 | 10.09 | 70 | 8.54 | 156 | 8.66 |
ST-A | 31 | 5.39 | 43 | 5.24 | 99 | 5.50 |
TL-A | 33 | 5.74 | 54 | 6.59 | 112 | 6.22 |
Experiment 2 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Neural | Image | Configuration A | Configuration B | Configuration C | ||||||
Network | Type | [m] | [s] | [h] | [m] | [s] | [h] | [m] | [s] | [h] |
EfficientNet B7 | omni | 0.00 | 0.52 | 2.15 | 3.06 | 0.48 | 3.25 | 4.43 | 0.47 | 2.16 |
EfficientNet B7 | panoramic | 0.03 | 0.56 | 37.21 | 3.21 | 0.49 | 16.24 | 3.92 | 0.50 | 11.30 |
EfficientNet V2L | omni | 0.00 | 0.35 | 1.98 | 2.34 | 0.35 | 3.84 | 4.94 | 0.34 | 2.07 |
EfficientNet V2L | panoramic | 0.00 | 0.39 | 14.54 | 3.11 | 0.37 | 15.46 | 3.60 | 0.36 | 12.14 |
MobileNet V2 | omni | 0.02 | 0.08 | 2.24 | 3.86 | 0.07 | 3.15 | 5.01 | 0.07 | 1.55 |
MobileNet V2 | panoramic | 0.36 | 0.11 | 16.32 | 4.33 | 0.11 | 15.56 | 6.87 | 0.11 | 11.53 |
Experiment 2 | ||||||
---|---|---|---|---|---|---|
Configuration A | Configuration B | Configuration C | ||||
Omni | Panoramic | Omni | Panoramic | Omni | Panoramic | |
Neural Network | [m] | [m] | [m] | [m] | [m] | [m] |
EfficientNet B7 + embeddings | 0.00 | 0.03 | 3.06 | 3.21 | 4.43 | 3.92 |
EfficientNet V2L + embeddings | 0.00 | 0.00 | 2.34 | 3.11 | 4.94 | 3.60 |
NetVLAD (VGG16 + VLAD) | 0.00 | 0.10 | 2.27 | 3.77 | 2.24 | 4.60 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rostkowska, M.; Skrzypczyński, P. Optimizing Appearance-Based Localization with Catadioptric Cameras: Small-Footprint Models for Real-Time Inference on Edge Devices. Sensors 2023, 23, 6485. https://doi.org/10.3390/s23146485
Rostkowska M, Skrzypczyński P. Optimizing Appearance-Based Localization with Catadioptric Cameras: Small-Footprint Models for Real-Time Inference on Edge Devices. Sensors. 2023; 23(14):6485. https://doi.org/10.3390/s23146485
Chicago/Turabian StyleRostkowska, Marta, and Piotr Skrzypczyński. 2023. "Optimizing Appearance-Based Localization with Catadioptric Cameras: Small-Footprint Models for Real-Time Inference on Edge Devices" Sensors 23, no. 14: 6485. https://doi.org/10.3390/s23146485
APA StyleRostkowska, M., & Skrzypczyński, P. (2023). Optimizing Appearance-Based Localization with Catadioptric Cameras: Small-Footprint Models for Real-Time Inference on Edge Devices. Sensors, 23(14), 6485. https://doi.org/10.3390/s23146485