Abstract
Visual Simultaneous Localization and Mapping (VSLAM) technology can provide reliable visual localization and mapping capabilities for critical tasks. Existing VSLAM can extract accurate feature points in static environments for matching and pose estimation, and then build environmental map. However, in dynamic environments, the feature points extracted by the VSLAM system will become inaccurate points as the object moves, which not only leads to tracking failure but also seriously affects the accuracy of the environmental map. To alleviate these challenges, we propose a dynamic target-aware optical flow tracking method based on YOLOv8. Firstly, we use YOLOv8 to identify moving targets in the environment, and propose a method to eliminate dynamic points in the dynamic contour region. Secondly, we use the optical flow mask method to identify dynamic feature points outside the target detection object frame. Thirdly, we comprehensively eliminate the dynamic feature points. Finally, we combine the geometric and semantic information of static map points to construct the semantic map of the environment. We used ATE (Absolute Trajectory Error) and RPE (Relative Pose Error) as evaluation metrics and compared the original method with our method on the TUM dataset. The accuracy of our method is significantly improved, especially 96.92% on walking_xyz dataset. The experimental results show that our proposed method can significantly improve the overall performance of VSLAM systems under high dynamic environments.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Code and Data Availability
The code that support the findings of this study are available from the corresponding author Dr. Zhuhua Hu upon reasonable request. And the datasets used are publicly available in: https://vision.in.tum.de/data/datasets/rgbd-dataset. The DFT-VSLAM demo video can be accessed at the following link: https://www.bilibili.com/video/BV16W421A7t4/?spm_id_from=333.999.0.0.
References
Shen, X., Chen, L., Hu, Z., Fu, Y., Qi, H., Xiang, Y., Wu, J.: A closed-loop detection algorithm for online updating of bag-of-words model. In Proceedings of the 2023 9th International Conference on Computing and Data Engineering, pp. 34–40. (2023)
Chen, Y., Li, N., Zhu, D., Zhou, C.C., Hu, Z., Bai, Y., Yan, J.: Bevsoc: Self-supervised contrastive learning for calibration-free bev 3d object detection. IEEE Internet Things J. (2024)
Ahmed Abdulsaheb, J., Jasim Kadhim, D., et al.: Real-time slam mobile robot and navigation based on cloud-based implementation. J. Robot. 2023 (2023)
Fu, Y., Han, B., Hu, Z., Shen, X., Zhao, Y.: Cbam-slam: A semantic slam based on attention module in dynamic environment. In 2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT).IEEE, pp. 1–6 (2022)
Hu, Z., Qi, W., Ding, K., Liu, G., Zhao, Y.: An adaptive lighting indoor vslam with limited on-device resources. IEEE Internet Things J. (2024)
Li, R., Zhao, Y., Hu, Z., Qi, W., Liu, G.: Tohf: A feature extractor for resource-constrained indoor vslam. J. Syst. Simul. (2024)
Hao Qi, Z.H.J.W.Y.Z., Fu, Y.: A lightweight semantic vslam approach based on adaptive thresholding and speed optimization. J. Beijing Univ. Aeronaut. Astronaut. (2024)
Soares, J.C.V., Gattass, M., Meggiolaro, M.A.: Crowd-slam: visual slam towards crowded environments using object detection. J. Intell. Robot. Syst. 102(2), 50 (2021)
Liu, G., Hu, Z., Zhao, Y., Li, R., Ding, K., Qi, W.: A key frame selection and local ba optimization method for vslam. Int. J. Robot, Autom (2024)
Qin, Y., Yu, H.: A review of visual slam with dynamic objects. Industrial Robot: the international journal of robotics research and application (2023)
Pu, H., Luo, J., Wang, G., Huang, T., Liu, H.: Visual slam integration with semantic segmentation and deep learning: A review. IEEE Sensors J. (2023)
Zhao, Y., Xiong, Z., Zhou, S., Peng, Z., Campoy, P., Zhang, L.: Ksf-slam: a key segmentation frame based semantic slam in dynamic environments. J. Intell. Robot. Syst. 105(1), 3 (2022)
Yu, C., Liu, Z., Liu, X.-J., Xie, F., Yang, Y., Wei, Q., Fei, Q.: Ds-slam: A semantic visual slam towards dynamic environments. In 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 1168–1174. IEEE (2018)
Bescos, B., Fácil, J.M., Civera, J., Neira, J.: Dynaslam: Tracking, mapping, and inpainting in dynamic scenes. IEEE Robot. Autom. Lett. 3(4), 4076–4083 (2018)
Cai, D., Hu, Z., Li, R., Qi, H., Xiang, Y., Zhao, Y.: Agam-slam: An adaptive dynamic scene semantic slam method based on gam. In International Conference on Intelligent Computing, pp. 27–39. Springer (2023)
Zhang, J., Henein, M., Mahony, R., Ila, V.: Vdo-slam: a visual dynamic object-aware slam system. arXiv preprint arXiv:2005.11052 (2020)
Liu, Y., Miura, J.: Rds-slam: Real-time dynamic slam using semantic segmentation methods. Ieee Access 9, 23772–23785 (2021)
Li, M., He, J., Jiang, G., Wang, H.: Ddn-slam: Real-time dense dynamic neural implicit slam with joint semantic encoding. arXiv preprint arXiv:2401.01545 (2024)
Pu, H., Luo, J., Wang, G., Huang, T., Liu, H.: Visual slam integration with semantic segmentation and deep learning: A review. IEEE Sensors J. (2023)
Zhong, F., Wang, S., Zhang, Z., Wang, Y.: Detect-slam: Making object detection and slam mutually beneficial. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1001–1010. IEEE (2018)
Jin, G., Zhong, X., Fang, Deng, S., Li, J.: Keyframe-based dynamic elimination slam system using yolo detection. In Intelligent Robotics and Applications: 12th International Conference, ICIRA 2019, Shenyang, China, August 8–11, 2019, Proceedings, Part IV 12, pp. 697–705. Springer (2019)
Wu, W., Guo, L., Gao, H., You, Z., Liu, Y., Chen, Z.: Yolo-slam: A semantic slam system towards dynamic environment with geometric constraint. Neural Comput. & Applic. 1–16 (2022)
Qi, H., Hu, Z., Xiang, Y., Cai, D., Zhao, Y.: Aty-slam: A visual semantic slam for dynamic indoor environments. In International Conference on Intelligent Computing, pp. 3–14. Springer (2023)
Campos, C., Elvira, R., Rodríguez, J.J.G., Montiel, J.M., Tardós, J.D.: Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam. IEEE Trans. Robot. 37(6), 1874–1890 (2021)
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475. (2023)
Qin, Y., Yu, H.: A review of visual slam with dynamic objects. Ind. Robot. Int. J. Robot. Res. App. 50(6), 1000–1010 (2023)
Zheng, B., Liu, Q., Zhao, F., Zhang, X., Wang, Q.: A visual slam method integrating semantic maps and loop closure detection. J. Chinese Inertial Technol. 28(5), 629–637 (2020)
Hempel, T., Al-Hamadi, A.: An online semantic mapping system for extending and enhancing visual slam. Eng. Appl. Artif. Intell. 111, 104830 (2022)
Cui, L., Ma, C.: Sdf-slam: Semantic depth filter slam for dynamic environments. IEEE Access 8, 95301–95311 (2020)
Cai, D., Li, R., Hu, Z., Lu, J., Li, S., Zhao, Y.: A comprehensive overview of core modules in visual slam framework. Neurocomputing 127760 (2024)
Kumar, D., Muhammad, N.: Object detection in adverse weather for autonomous driving through data merging and yolov8. Sensors 23(20), 8471 (2023)
Huang, Z., Shi, X., Zhang, C., Wang, Q., Cheung, K.C., Qin, H., Dai, J., Li, H.: Flowformer: A transformer architecture for optical flow. In European Conference on Computer Vision, pp. 668–685. Springer (2022)
Zhang, Z., Zhao, J., Huang, C., Li, L.: Learning visual semantic map-matching for loosely multi-sensor fusion localization of autonomous vehicles. IEEE Trans. Intell. Veh. 8(1), 358–367 (2022)
Sturm, J., Engelhard, N., Endres, F., Burgard, W., Cremers, D.: A benchmark for the evaluation of rgb-d slam systems. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pp. 573–580. IEEE (2012)
Cheng, S., Sun, C., Zhang, S., Zhang, D.: Sg-slam: A real-time rgb-d visual slam toward dynamic scenes with semantic and geometric information. IEEE Trans. Instrum. Meas. 72, 1–12 (2022)
Ji, Q., Zhang, Z., Chen, Y., Zheng, E.: Drv-slam: An adaptive real-time semantic visual slam based on instance segmentation toward dynamic environments. Ieee Access 12, 43827–43837 (2024)
Cheng, J., Wang, Z., Zhou, H., Li, L., Yao, J.: Dm-slam: A feature-based slam system for rigid dynamic scenes. ISPRS Int. J. Geo-Information 9(4), 202 (2020)
Cong, P., Li, J., Liu, J., Xiao, Y., Zhang, X.: Seg-slam: Dynamic indoor rgb-d visual slam integrating geometric and yolov5-based semantic information. Sensors 24(7), 2102 (2024)
Acknowledgements
This research was supported by the National Natural Science Foundation of China (Grant no. 62161010), the Key Research and Development Project of Hainan Province (Grant no. ZDYF2022GXJS348 and Grant no. ZDYF2022SHFZ039), and the Hainan Province Natural Science Foundation (623RC446). The authors would like to thank the referees for their constructive suggestions.
Author information
Authors and Affiliations
Contributions
All authors contributed to the conception or design of the work, the analysis and interpretation of the data, and the draft of the manuscript.
Corresponding author
Ethics declarations
Conflicts of Interest
No conflict of interest exists in the submission of this manuscript.
Ethics Approval
Not applicable (this article does not contain any studies with human participants or animals performed by any of the authors).
Consent to Participate
All authors have participated in conception and design, or analysis and interpretation of the data; drafting the article or revising it critically for important intellectual content, and approval of the final version.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Cai, D., Li, S., Qi, W. et al. DFT-VSLAM: A Dynamic Optical Flow Tracking VSLAM Method. J Intell Robot Syst 110, 135 (2024). https://doi.org/10.1007/s10846-024-02171-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10846-024-02171-7