Abstract
Since the moving objects can degrade the stability of the visual-inertial simultaneous localization and mapping (VI-SLAM) systems significantly, this paper proposes a method based on deep learning and space constraints to distinguish the dynamic and static semantic objects. In detail, a pre-trained object detector predicts bounding boxes and class probabilities from selected keyframes. Then, a proposed random-sampling clustering algorithm, R-DBSCAN, filters the outliers lying within the prefiltered bounding boxes. After calculating the centroid of the remaining feature points, the semantic object’s attributes are judged by a proposed strategy: if the dynamic probability of the semantic object is larger than the static probability, the semantic object is dynamic; otherwise, it is static. Additionally, the drift errors of the VI-SLAM is constrained by the pedestrian dead reckoning (PDR) velocity. A series of experiments are conducted using the self-collected dataset to evaluate the accuracy of the proposed method in distinguishing the attributes of semantic objects for application scenes. The results demonstrate that the proposed method achieves the highest precision, recall, and F1 score as compared to other state-of-the-art methods. Additionally, the centroid of the dynamic semantic object changes in continuous keyframes reflecting its motion trajectory relative to the pose graph.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets used or analysed during the current study are available from the corresponding author on reasonable request.
Code Availability
custom code
References
Vidal, A.R., Rebecq, H., Horstschaefer, T., Scaramuzza, D.: Ultimate SLAM? combining events, images, and IMU for robust visual SLAM in HDR and High-Speed scenarios. IEEE Robot. Autom. Lett., 994–1001 (2018)
Li, R., Wang, S., Gu, D.: Ongoing evolution of visual SLAM from geometry to deep learning: Challenges and opportunities. Cogn. Comput. 875–889 (2018)
Jin, S., Chen, L., Sun, R., McLoone, S.: A novel vSLAM framework with unsupervised semantic segmentation based on adversarial transfer learning. Appl. Soft Comput. (2020)
Zhao, X., Wang, C., Ang, M.H.: Real-time visual-inertial localization using semantic segmentation towards dynamic environments. IEEE Access, 155047-155059 (2020)
Ni, J., Gong, T., Gu, Y., Zhu, J., Fan, X.: An improved deep residual Network-Based semantic simultaneous localization and mapping method for monocular vision robot. Comput. Intell. Neurosci. 1–14 (2020)
Wang, Z., Zhang, Q., Li, J., Zhang, S., Liu, J.: A computationally efficient semantic SLAM solution for dynamic scenes. Remote Sens. (2019)
Yang, D., Bi, S., Wang, W., Yuan, C., Wang, W., Qi, X., Cai, Y.: DRE-SLAM: Dynamic RGB-D encoder SLAM for a Differential-Drive robot. Remote Sens. 380–409 (2019)
Raul, M., Tardos, J.D.: ORB-SLAM2: An Open-Source SLAM system for monocular, stereo, and RGB-d cameras. IEEE Trans. Robot. 1255–1262 (2017)
Yu, C., Liu, Z., Liu, X., Xie, F., Yang, Y., Wei, Q., Fei, Q.: DS-SLAM: A Semantic Visual SLAM Towards Dynamic Environments. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2018)
Bescos, B., Facil, J.M., Civera, J., Neira, J.: DynaSLAM: tracking, mapping, and inpainting in Dynamic Scenes. IEEE Robot. Autom. Lett. 4076–4083 (2018)
Yang, S., Fan, G., Bai, L., Zhao, C., Li, D.: SGC-VSLAM: A semantic and geometric constraints VSLAM for dynamic indoor environments sensors (basel) (2020)
Sun, Y., Liu, M., Meng, M.Q.H.: Motion removal for reliable RGB-d SLAM in dynamic environments. Rob. Auton. Syst. 115–128 (2018)
Iqbal, A., Gans, N.R.: Data association and localization of classified objects in visual SLAM. J. Intell. Robot. Syst. 113–130 (2020)
Kundu, A., Krishnam, K.M., Sivaswamy, J.: Moving object detection by multi-view geometric techniques from a single camera mounted robot. In: 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 4306–4312 (2009)
Lin, K., Wang, C.: Stereo-based simultaneous localization, mapping and moving object tracking. In: 2010 IEEE RSJ International Conference on Intelligent Robots and Systems, pp. 3975–3980 (2010)
Hartley, R., Zisserman, A.: Multiple view geometry in computer vision (2003)
Li, S., Lee, D.: Fast visual odometry using Intensity-Assisted iterative closest point. IEEE Robot. Autom. Lett. 992–999 (2016)
J., B. P., D., M. N.: A method for registration of 3-D shapes. The Sensor Fusion IV, Control Paradigms and Data Structures (1992)
Cheng, J., Sun, Y., Meng, M.Q.H.: Improving monocular visual SLAM in dynamic environments: an optical-flow-based approach. Adv. Robot. 576–589 (2019)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You Only Look Once:Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn:Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS), 91–99 (2015)
Redmon, J., Ali, F.: YOLO9000:better,faster,stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp 6517–6525 (2017)
Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement. arXiv:1804.02767(2018)
Bochkovskiy, A., Wang, C.Y., Liao, M.H.Y.: YOLOv4: Optimal speed and accuracy of object detection. arXiv:2004.10934 (2020)
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: Object Detection via region-based fully convolutional networks. Comput. Vis. Pattern Recognit. 1–11 (2016)
Al-Furaiji, O.J., Anh Tuan, N., Tsviatkou, V.Y.: A new fast efficient non-maximum suppression algorithm based on image segmentation. Indones. J. Electr. Eng. Comput. Sci. pp 1062–1070 (2020)
Bolya, D., Zhou, C., Xiao, F., Lee, Y.: Technical Report: YOLACT: Real-time Instance Segmentation. In: The IEEE International Conference on Computer Vision (ICCV) (2019)
Wang, J., Zhang, W., Cao, Y., Chen, K., Pang, J., Gong, T., Shi, J., Chen, C.L., Lin, D.: Technical report: boundary-aware localization with content-aware feature aggregation. In: The IEEE International Conference on Computer Vision (ICCV) (2019)
Zhang, L., Wei, L., Shen, P., Wei, W., Zhu, G., Song, J.: Semantic SLAM based on object detection and improved octomap. IEEE Access, 75545–75559 (2018)
Xiao, L., Wang, J., Qiu, X., Rong, Z., Zou, X.: Dynamic-SLAM: Semantic monocular visual localization and mapping based on deep learning in dynamic environment. Rob. Auton. Syst. 1–16 (2019)
Li, P., Zhang, G., Zhou, J.: Study on slam algorithm based on object detection in dynamic scene. In: Proceedings of the 2019 International Conference on Advanced Mechatronic Systems, pp 363–367 (2019)
Han, S., Xi, Z.: Dynamic scene semantics SLAM based on semantic segmentation. IEEE Access, 43563–43570 (2020)
Zhao, H., Shi, J., Qi, X., Wang, X.: Pyramid scene parsing network. IEEE Computer Society (2017)
Horn, B., G, S.B.: Determining optical flow. Artificial Intelligence, 185–203 (1981)
Liu, Y., Miura, J.: RDMO-SLAM Real-Time Visual SLAM For Dynamic Environments Using Semantic Label Prediction With Optical Flow. IEEE Access, 106981-106997 (2021)
Liu, Y., Miura, J.: RDS-SLAM: Real-Time dynamic SLAM using semantic segmentation methods. IEEE Access, 23772–23785 (2021)
Ai, Y., Rui, T., Yang, X., He, J., Fu, L., Li, J., Lu, M.: Visual SLAM in dynamic environments based on object detection. Defence Technology, 1712–1721 (2020)
Qin, T., Li, P., Shen, S.: VINS-Mono: A robust and versatile monocular visual-inertial state estimator. IEEE Transactions on Robotics, 1004–1020 (2018)
Guan, P., Cao, Z., Chen, E., Liang, S., Tan, M., Yu, J.: A real-time semantic visual SLAM approach with points and objects. International Journal of Advanced Robotic Systems (2020)
Forster, C., Carlone, L., Dellaert, F., Scaramuzza, D.: On-Manifold sodometry. IEEE Transactions on Robotics, 1–21 (2017)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., Berg, A.C.: SSD: single shot MultiBox detector. In: The IEEE International Conference on Computer Vision (ICCV) (2016)
Zhu, C., Chen, F., Shen, Z., Savvides, M.: Soft Anchor-Point object detection. Springer Nature Switzerland, 91–107 (2020)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. In: International Journal of Computer Vision, pp. 303–338 (2009)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: IEEE Transactions on Pattern Analy-sis and Machine Intelligence (TPAMI), pp. 1904–1916 (2015)
Wang, C. -Y., Mark Liao, H. -Y., Wu, Y. -H., Chen, P. -Y., Hsieh, J. -W., Yeh, I.H.: CSPNEt: A New Backbone that can Enhance Learning Capability of CNN. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition workshops (CVPRW), pp. 1571-1580 (2020)
Ai, Y., Rui, T., Lu, M., Fu, L., Liu, S., Wang, S.: DDL-SLAM: A robust RGB-D SLAM in dynamic environments combined with deep learning. IEEE Access, 162335-162342 (2020)
Sheng, C., Pan, S., Gao, W., Tan, Y., Zhao, T.: Dynamic-DSO: Direct sparse odometry using objects semantic information for dynamic environments. Applied Sciences (2020)
Macqueen, J.: Some methods for classification and analysis of multivariate observations. Berkeley Symposium on Mathematical Statistics and Probability (1967)
Arthur, D., Vassilvitskii S.: k-means++: The advantages of careful seeding the eighteenth annual ACM-SIAM symposium on discrete algorithms (2007)
Wang, L., Li, M., Han, X., Zheng, K.: An improved density-based spatial clustering of application with noise. International Journal of Computers and Applications, 1–7 (2018)
Zhong, F., Wang, S., Zhang, Z., Chen, C., Wang, Y.: Detect-SLAM: Making Object Detection and SLAM Mutually Beneficial. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1001–1010 (2018)
Funding
This work was supported by the Overseas Taishan Scholars Program.
Author information
Authors and Affiliations
Contributions
C.L., W.C., and H.W. designed the algorithm. C.L., M.Z., F.L., and S.L. performed the experiments and analyzed the data. C.L. and Q.L. developed the source code and wrote the draft. All the author revised the manuscript.
Corresponding author
Ethics declarations
Ethics Approval
Not applicable
Conflict of Interests
No conflict of interest
Consent for Publication
Not applicable
Consent to participate
Not applicable
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Chao Li and Wennan Chai contributed equally to this work.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, C., Chai, W., Zhang, M. et al. A Novel Method for Distinguishing Indoor Dynamic and Static Semantic Objects Based on Deep Learning and Space Constraints in Visual-inertial SLAM. J Intell Robot Syst 106, 26 (2022). https://doi.org/10.1007/s10846-022-01730-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10846-022-01730-0