Abstract
In recent years, the demand for robots is not only limited to sophisticated industrial setups, there exists an unprecedented demand for low-cost robots in living places with the capabilities of performing human-centric operations. For the semantic-rich mapping of random environments, current state-of-the-art techniques include sophisticated hardware like Kinect sensor, Lidar, deep learning (DL)-based vision, and stereo vision-based systems. Inevitably, these systems increase the cost of the product which requires expensive hardware for processing the information. It, therefore, creates a hurdle to implementing them on low-cost service robots where interaction matters more than precision. To overcome these issues, in this paper, we propose two novel techniques: 1) a light, yet efficient, semantic mapping technique for scene-wise localization of objects by combining object detection and camera geometry; 2) an accurate and robust novel integration technique for coalition of scene-wise information for large-scale maps. The main goal of this framework is to host a semantic mapping process on a limited processing device like Raspberry Pi. The semantic information can be further integrated into any Human-Robot Interaction (HRI) system. A tensorflow-lite version of Single Shot Detection (SSD) for object detection, a wheel odometer for odometry tracking, and pinhole camera geometry are used for the whole mapping process. The proposed model has demonstrated promising results by accurately mapping the environment with semantic-rich features. Current work is time efficient and suitable for object-orientated task execution of low-cost robots, such as smart toys and other smart home gadgets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The data that support the findings of this study are available to reasonable request.
References
Chandio, A., Shen, Y., Bendechache, M., Inayat, I., Kumar, T.: AUDD: audio Urdu digits dataset for automatic audio Urdu digit recognition. Appl. Sci. 11, 8842 (2021)
Singh, A., Ranjbarzadeh, R., Raj, K., Kumar, T., Roy, A.: Understanding EEG signals for subject-wise definition of armoni activities. ArXiv:2301.00948 (2023)
Singh, A., Raj, K., Kumar, T., Verma, S., Roy, A.: Deep learning-based cost-effective and responsive robot for autism treatment. Drones. 7, 81 (2023)
Kumar, T., Park, J., Ali, M., Uddin, A., Bae, S.: Class specific autoencoders enhance sample diversity. J. Broadcast. Eng. 26, 844–854 (2021)
Roy, A.M.: An efficient multi-scale CNN model with intrinsic feature integration for motor imagery EEG subject classification in brain-machine interfaces. Biomed. Signal Proc. Control. 74, 103496 (2022)
Roy, A.M.: A multi-scale fusion CNN model based on adaptive transfer learning for multi-class MI-classification in BCI system. bioRxiv. https://doi.org/10.1101/2022.03.17.481909 (2022)
Roy, A.M., Bhaduri, J.: A deep learning enabled multi-class plant disease detection model based on computer vision. AI. 2, 413–428. https://doi.org/10.3390/ai2030026 (2022)
Roy, A.M., Bose, R., Bhaduri, J.: A fast accurate fine-grain object detection model based on YOLOv4 deep neural network. Neural Comput. & Applic. 34, 3895–3921 (2022)
Roy, A.M., Bose, R., Bhaduri, J.: Real-time growth stage detection model for high degree of occultation using DenseNet-fused YOLOv4. Comput. Electron. Agric. 193, 106694 (2022)
Jamil, S., Abbas, M.S., Roy, A.M.: Distinguishing malicious drones using vision transformer. AI. 3, 260–273 (2022)
Roy, A.M.: Adaptive transfer learning-based multiscale feature fused deep convolutional neural network for EEG MI multiclassification in brain-computer interface. Eng. Appl. Artif. Intell. 116, 105347, https://doi.org/10.1016/j.engappai.2022.105347 (2022)
Aleem, S., Kumar, T., Little, S., Bendechache, M., Brennan, R., McGuinness, K.: Random data augmentation based enhancement: a generalized enhancement approach for medical datasets. ArXiv:2210.00824 (2022)
Chandio, A., Gui, G., Kumar, T., Ullah, I., Ranjbarzadeh, R., Roy, A., Hussain, A., Shen, Y.: Precise single-stage detector. ArXiv:2210.04252 (2022)
Roy, A., Bhaduri, J., Kumar, T., Raj, K.: A computer vision-based object localization model for endangered wildlife detection. Ecol. Econ, Forthcom (2022)
Kumar, T., Turab, M., Talpur, S., Brennan, R., Bendechache, M.: Forged character detection datasets: passports, driving licences and visa stickers. Int. J. Artif. Intell. Appl. 13, 21–35 (2022)
Roy, A.M., Guha, S.: A data-driven physics-constrained deep learning computational framework for solving von Mises plasticity. Eng. Appl. Artif. Intell. 122, 106049 (2023)
Roy, A.M., Bose, R.: Physics-aware deep learning framework for linear elasticity. arXiv:2302.09668 (2023)
Roy, A.M., Guha, S.: Elastoplastic physics-informed deep learning approach for J2 plasticity. SSRN. SSRN 4332254 (2023)
Crespo, J., Castillo, J.C., Mozos, O.M., Barber, R.: Semantic information for robot navigation: a survey. Appl. Sci. 10(2), 497 (2020)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural. Inf. Process. Syst. 27 (2014)
Fernandez-Chaves, D., Ruiz-Sarmiento, J.R., Jaenal, A., Petkov, N., Gonzalez-Jimenez, J.: Robot@ VirtualHome, an ecosystem of virtual environments and tools for realistic indoor robotic simulation. Expert Syst. Appl. 208, 117970 (2022)
Zhao, C., Sun, Q., Zhang, C., Tang, Y., Qian, F.: Monocular depth estimation based on deep learning: an overview. Sci. China Technol. Sci. 63(9), 1612–1627 (2020)
Ming, Y., Meng, X., Fan, C., Yu, H.: Deep learning for monocular depth estimation: a review. Neurocomputing 438, 14–33 (2021)
Khan, F., Salahuddin, S., Javidnia, H.: Deep learning-based monocular depth estimation methods-A state-of-the-art review. Sensors 20(8), 2272 (2020)
Zama Ramirez, P., Poggi, M., Tosi, F., Mattoccia, S., Di Stefano, L.: Geometry meets semantics for semi-supervised monocular depth estimation. In: Computer vision-ACCV 2018: 14th asian conference on computer vision, Perth, Australia, December 2-6, 2018, Revised Selected Papers, Part III 14, pp. 298–313. Springer International Publishing (2019)
Tosi, F., Aleotti, F., Poggi, M., Mattoccia, S.: Learning monocular depth estimation infusing traditional stereo knowledge. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9799–9809 (2019)
Zou, D., Tan, P., Yu, W.: Collaborative visual SLAM for multiple agents: a brief survey. Virtual Reality & Intelligent Hardware 1(5), 461–482 (2019)
Saputra, M.R.U., Markham, A., Trigoni, N.: Visual SLAM and structure from motion in dynamic environments: a survey. ACM Comput. Surv. (CSUR) 51(2), 1–36 (2018)
Poole, A., Sutcliffe, M., Pierce, G., Gachagan, A.: A novel complete-surface-finding algorithm for online surface scanning with limited view sensors. Sensors 21(22), 7692 (2021)
Liu, F., Shen, C., Lin, G., Reid, I.: Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern. Anal. Mach. Intell. 38(10), 2024–2039 (2015)
Kuznietsov, Y., Stuckler, J. and Leibe, B.: Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6647-6655 (2017)
Cao, Yuanzhouhan, Zifeng Wu, and Chunhua Shen.: Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Trans. Circ. Syst Video Technol. 28(11), 3174–3182 (2017)
Tulsiani, S., Zhou, T., Efros, A.A., Malik, J.: Multi-view supervision for single-view reconstruction via differentiable ray consistency. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2626-2634 (2017)
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE international conference on computer vision, pp. 2650–2658 (2015)
Wang, X., Fouhey, D., Gupta, A.: Designing deep networks for surface normal estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 539-547 (2015)
Celik, K., Somani, A.K.: Monocular vision SLAM for indoor aerial vehicles. J. Electr. Comput. Eng. 2013, 4–4 (2013)
Nguyen, V., Harati, A., Martinelli, A., Siegwart, R., Tomatis, N.: Orthogonal SLAM: a step toward lightweight indoor autonomous navigation. In: 2006 IEEE/RSJ International conference on intelligent robots and systems, pp. 5007–5012. IEEE (2006)
Lin, W., Hu, J., Xu, H., Ye, C., Ye, X., Li, Z.: Graph-based SLAM in indoor environment using corner feature from laser sensor. In: 2017 32nd Youth academic annual conference of chinese association of automation (YAC), pp. 1211–1216. IEEE (2017)
Ruiz-Sarmiento, J.R., Galindo, C., González-Jiménez, J.: Robot@ home, a robotic dataset for semantic mapping of home environments. Int. J. Robot. Res. 36(2), 131–141 (2017)
Janoch, A., Darrell, T., Abbeel, P., Malik, J.: The berkeley 3d object dataset. Techn. Report No. UCB/EECS-2012-85. University of California at Berkeley (2012)
Singh, A., Sha, J., Narayan, K.S., Achim, T. and Abbeel, P.: Bigbird: A large-scale 3d database of object instances. In: 2014 IEEE international conference on robotics and automation (ICRA), pp. 509–516. IEEE (2014)
Xiao, J., Owens, A., Torralba, A.: Sun3d: A database of big spaces reconstructed using sfm and object labels. In: Proceedings of the IEEE international conference on computer vision, pp. 1625–1632 (2013)
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view rgb-d object dataset. In: 2011 IEEE international conference on robotics and automation, pp. 1817–1824. IEEE (2011)
Singh, A., Narula, R., Rashwan, H.A., Abdel-Nasser, M., Puig, D., Nandi, G.C.: Efficient deep learning-based semantic mapping approach using monocular vision for resource-limited mobile robots. Neural Comput. & Applic. pp. 1–15 (2022)
Sünderhauf, N., Pham, T.T., Latif, Y., Milford, M., Reid, I.: Meaningful maps with object-oriented semantic mapping. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 5079-5085. IEEE (2017)
Maolanon, P., Sukvichai, K., Chayopitak, N., Takahashi, A.: Indoor room identify and mapping with virtual based SLAM using furnitures and household objects relationship based on CNNs. In: 2019 10th International conference of information and communication technology for embedded systems (IC-ICTES), pp. 1–6. IEEE (2019)
Sünderhauf, N., Dayoub, F., McMahon, S., Talbot, B., Schulz, R., Corke, P., Wyeth, G., Upcroft, B., Milford, M.: Place categorization and semantic mapping on a mobile robot. In: 2016 IEEE international conference on robotics and automation (ICRA), pp. 5729–5736. IEEE (2016)
McCormac, J., Handa, A., Davison, A., Leutenegger, S.: Semanticfusion: Dense 3d semantic mapping with convolutional neural networks. In: 2017 IEEE international conference on robotics and automation (ICRA), pp. 4628–4635. IEEE (2017)
Ma, L., Stückler, J., Kerl, C., Cremers, D.: Multi-view deep learning for consistent semantic mapping with rgb-d cameras. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 598–605. IEEE (2017)
Wang, W., Yang, J., You, X.: Combining ElasticFusion with PSPNet for RGB-D based indoor semantic mapping. In: 2018 Chinese automation congress (CAC), pp. 2996–3001. IEEE (2018)
ermans, A., Floros, G., Leibe, B.: Dense 3d semantic mapping of indoor scenes from rgb-d images. In: 2014 IEEE international conference on robotics and automation (ICRA), pp. 2631–2638. IEEE( 2014)
Liu, B., Gould, S., Koller, D.: Single image depth estimation from predicted semantic labels. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 1253–1260. IEEE (2010)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: single shot multibox detector. In: European conference on computer vision, pp. 21–37. Springer, Cham (2016)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 (2017)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587 (2014)
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. "Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv:1804.02767 (2018)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016)
Xie, S., Sun, C., Huang, J., Tu, Z., Murphy, K.: Rethinking spatiotemporal feature learning: speed-accuracy trade-offs in video classification. In: Proceedings of the european conference on computer vision (ECCV), pp. 305–321 (2018)
Clémençon, S., De Arazoza, H., Rossi, F., Tran, V.C.: Hierarchical clustering for graph visualization. arXiv:1210.5693 (2012)
Anand, G., Kumawat, A.K.: Object detection and position tracking in real time using Raspberry Pi. Mater. Today Proc. 47, 3221–3226 (2021)
Dai, J.: Real-time and accurate object detection on edge device with TensorFlow Lite. In: Journal of physics: conference series, vol. 1651, no. 1, pp. 012114. IOP Publishing (2020)
Nachammai, R.M., Kansara, N.M., Lavanya, G., Gopalakrishnan, R.: White line follower using firebird V robot. Int. J. Sci. Res. Dev. 3(10), 224–228 (2015)
Yi, Z., Yongliang, S., Jun, Z.: An improved tiny-yolov3 pedestrian detection algorithm. Optik 183, 17–23 (2019)
Buratowski, T., Giergiel, J.: Dynamics modeling and identification of the amigobot robot. Mech. Mech. Eng. 14(1), 65–79 (2010)
Macario Barros, A., Michel, M., Moline, Y., Corre, G., Carrel, F.: A comprehensive survey of visual slam algorithms. Robotics 11(1), 24 (2022)
Choi, W., Chao, Y.W., Pantofaru, C., Savarese, S.: Indoor scene understanding with geometric and semantic contexts. Int. J. Comput. Vis. 112(2), 204–220 (2015)
Acknowledgements
The support of the Aeronautical Research and Development Board (Grant No. DARO/08/1051450/M/I) is gratefully acknowledged.
Author information
Authors and Affiliations
Contributions
AS, KR, AMR - Conceptualization; AS, KR - Data curation; Formal analysis; Investigation; Methodology; Software; Validation; Visualization; Roles/Writing - original draft; Writing - review, and editing. AA, KR Formal analysis; Investigation; Methodology; Roles/Writing - original draft; AS, KR, AMR - Writing, review, and editing; AMR- Supervision; Project Administration.
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical and informed consent for data used
We used a publically available open-source dataset. Thus, informed consent for the dataset included in the study is not required.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Singh, A., Raj, K. & Roy, A.M. Efficient Deep Learning-based Semantic Mapping Approach using Monocular Vision for Resource-Limited Mobile Robots. J Intell Robot Syst 109, 69 (2023). https://doi.org/10.1007/s10846-023-01988-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10846-023-01988-y