Abstract
Object detection and object recognition are the most important applications of computer vision. To pursue the task of object detection efficiently, a model with higher detection accuracy is required. Increasing the detection accuracy of the model increases the model’s size and computation cost. Therefore, it becomes a challenge to use deep learning in embedded environments. To overcome this problem, the current research suggests a transfer-learning-based model for real-time object detection that enhances the YOLO algorithm's effectiveness. The model utilizes YOLOv6 as a baseline model. This study proposes a pruning and finetuning algorithm as well as a transfer learning algorithm for enhancing the proposed model’s efficiency in terms of detection accuracy and inference speed. This paper also focuses on how the proposed model will be able to identify all objects (indoor as well as outdoor) in a scene and provides a voice output to warn the user about nearby and faraway objects. To receive the audio feedback, Google Text-to-Speech (gTTs) library is used. The model is trained on the MS-COCO dataset. The proposed model is compared with the Tensorflow Single Shot Detector model, Faster RCNN model, Mask RCNN model, YOLOv4, and baseline YOLOv6 model. After pruning the YOLOv6 baseline model by 30%, 40%, and 50%, the finetuned YOLOv6 framework hits 37.8% higher average precision (AP) with 1235 frames per second (FPS).
Similar content being viewed by others
Data availability
Data will be made available on appropriate request.
Change history
09 May 2023
A Correction to this paper has been published: https://doi.org/10.1007/s11554-023-01313-8
References
Zhang, J., Wang, P., Zhao, Z., Su, F.: Pruned-YOLO: learning efficient object detector using model pruning. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 12894 LNCS, 34–45 (2021). https://doi.org/10.1007/978-3-030-86380-7_4/COVER/
Li, Y., Ge, Z., Yu, G., Yang, J., Wang, Z., Shi, Y., Sun, J., Li, Z.: BEVDepth: acquisition of reliable depth for multi-view 3D object detection. arXiv preprint. https://doi.org/10.48550/arXiv.2206.10092 (2022)
Xu, Q., Zhong, Y., Neumann, U.: Behind the curtain: learning occluded shapes for 3D object detection. Proc. AAAI Conf. Artif. Intell. 36, 2893–2901 (2022). https://doi.org/10.1609/aaai.v36i3.20194
Sun, W., Dai, L., Zhang, X., Chang, P., He, X.: RSOD: real-time small object detection algorithm in UAV-based traffic monitoring. Appl. Intell. 52, 8448–8463 (2022). https://doi.org/10.1007/s10489-021-02893-3
KhoshboreshMasouleh, M., Shah-Hosseini, R.: Development and evaluation of a deep learning model for real-time ground vehicle semantic segmentation from UAV-based thermal infrared imagery. ISPRS J. Photogramm. Remote Sens. 155, 172–186 (2019). https://doi.org/10.1016/j.isprsjprs.2019.07.009
Hou, L., Chen, C., Wang, S., Wu, Y., Chen, X.: Multi-object detection method in construction machinery swarm operations based on the improved YOLOv4 model. Sensors. 22, 1–14 (2022)
Mauri, A., Khemmar, R., Decoux, B., Haddad, M., Boutteau, R.: Lightweight convolutional neural network for real-time 3D object detection in road and railway environments. J. Real-Time Image Process. 19, 499–516 (2022). https://doi.org/10.1007/s11554-022-01202-6
Martinez-Alpiste, I., Golcarenarenji, G., Wang, Q., Alcaraz-Calero, J.M.: Smartphone-based real-time object recognition architecture for portable and constrained systems. J. Real-Time Image Process. 19, 103–115 (2022). https://doi.org/10.1007/s11554-021-01164-1
Hu, J., Wang, T., Zhu, S.: Multi-view aggregation for real-time accurate object detection of a moving camera. J. Real-Time Image Process. (2022). https://doi.org/10.1007/s11554-022-01253-9
Zhang, J., Ye, Z., Jin, X., Wang, J., Zhang, J.: Real-time traffic sign detection based on multiscale attention and spatial information aggregator. J. Real-Time Image Process. (2022). https://doi.org/10.1007/s11554-022-01252-w
Saponara, S., Elhanashi, A., Zheng, Q.: Developing a real-time social distancing detection system based on YOLOv4-tiny and bird-eye view for COVID-19. J. Real-Time Image Process. 19, 551–563 (2022). https://doi.org/10.1007/s11554-022-01203-5
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788 (2016)
Nikkath Bushra, S., Shobana, G., Uma Maheswari, K., Subramanian, N.: Smart video survillance based weapon identification using yolov5. 351–357 (2022). https://doi.org/10.1109/ICESIC53714.2022.9783499
Xia, R., Li, G., Huang, Z., Pang, Y., Qi, M.: Transformers only look once with nonlinear combination for real-time object detection. Neural Comput. Appl. (2022). https://doi.org/10.1007/s00521-022-07333-y
Junayed, M.S., Islam, M.B., Imani, H., Aydin, T.: PDS-Net: a novel point and depth-wise separable convolution for real-time object detection. Int. J. Multimed. Inf. Retr. 11, 171–188 (2022). https://doi.org/10.1007/s13735-022-00229-6
Kadhim, M., Oleiwi, B.: Blind assistive system based on real time object recognition using machine learning. Eng. Technol. J. 40, 159–165 (2022). https://doi.org/10.30684/etj.v40i1.1933
Ashiq, F., Asif, M., Ahmad, M.B., Zafar, S., Masood, K., Mahmood, T., Mahmood, M.T., Lee, I.H.: CNN-based object recognition and tracking system to assist visually impaired people. IEEE Access. 10, 14819–14834 (2022). https://doi.org/10.1109/ACCESS.2022.3148036
Gupta, C., Gill, N.S., Gulia, P.: SSDT : distance tracking model based on deep learning. Int. J. Electr. Comput. Eng. Syst. 13, 339–348 (2022). https://doi.org/10.32985/ijeces.13.5.2
Gupta, C., Gill, N.S.: Coronamask: a face mask detector for real-time data. Int. J. Adv. Trends Comput. Sci. Eng. 9, 5624–5630 (2020). https://doi.org/10.30534/ijatcse/2020/212942020
Cai, Y., Yuan, G., Li, H., Niu, W., Li, Y., Tang, X., Ren, B., Wang, Y.: A compression-compilation co-design framework towards real-time object detection on mobile devices. 35th AAAI Conf. Artif. Intell. AAAI 2021. 18: 1597–1600 (2021)
Chen, C., Wang, G., Peng, C., Fang, Y., Zhang, D., Qin, H.: Exploring rich and efficient spatial temporal interactions for real-time video salient object detection. IEEE Trans. Image Process. 30, 3995–4007 (2021). https://doi.org/10.1109/TIP.2021.3068644
What’s New in YOLOv6?, https://blog.roboflow.com/yolov6/
Li, C., Li, L., Jiang, H., Weng, K., Geng, Y., Li, L., Ke, Z., Li, Q., Cheng, M., Nie, W., Li, Y., Zhang, B, 30m., Liang, Y., Zhou, L., Xu, X., Chu, X., Wei, X., Wei, X.: YOLOv6: A single-stage object detection framework for industrial applications. (2022)
Zhang, H., Wang, Y., Dayoub, F., Sünderhauf, N.: VarifocalNet: An IoU-aware dense object detector. Proc. IEEE Comput. Soc. Conf Comput. Vis. Pattern Recognit. (2021). https://doi.org/10.1109/CVPR46437.2021.00841
Li, X., Wang, W., Wu, L., Chen, S., Hu, X., Li, J., Tang, J., Yang, J.: Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Adv. Neural Inf. Process. Syst. 2020, 1–11 (2020)
Bonnaerens, M., Freiberger, M., Dambre, J.: Anchor pruning for object detection. Comput. Vis. Image Underst. 221, 1035 (2022). https://doi.org/10.1016/j.cviu.2022.103445
Zhong, Y., Wang, J., Peng, J., Zhang, L.: Anchor box optimization for object detection. Proc. - 2020 IEEE Winter Conf. Appl. Comput. Vision, WACV 2020. 1275–1283 (2020). https://doi.org/10.1109/WACV45572.2020.9093498
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: Common objects in context. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 8693 LNCS, 740–755 (2014). https://doi.org/10.1007/978-3-319-10602-1_48/COVER/
COCO - Common objects in context, https://cocodataset.org/#download
Mehta, R., Ozturk, C.: Object detection at 200 frames per second. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 11133 LNCS, 659–675 (2019). https://doi.org/10.1007/978-3-030-11021-5_41
Author information
Authors and Affiliations
Contributions
All authors have equal contribution in this manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: In this article Author Jyotir Moy Chatterjee affiliation wrongly mention. It has been corrected.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gupta, C., Gill, N.S., Gulia, P. et al. A novel finetuned YOLOv6 transfer learning model for real-time object detection. J Real-Time Image Proc 20, 42 (2023). https://doi.org/10.1007/s11554-023-01299-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11554-023-01299-3