TGC-YOLOv5: An Enhanced YOLOv5 Drone Detection Model Based on Transformer, GAM & CA Attention Mechanism
<p>SUAV-DATA.</p> "> Figure 2
<p>Drone templates.</p> "> Figure 3
<p>(<b>a</b>) Length and width distribution statistics of large, medium, and small drones in the SUAV-DATA dataset; (<b>b</b>) Statistical data of the number of large, medium, and small drones in the SUAV-DATA dataset.</p> "> Figure 4
<p>Frame diagram of TGC-YOLOv5.</p> "> Figure 5
<p>Feature enhancement performance graph of the ablation experimental model. NOTE: Figure (<b>a</b>) depicts the original input image, while figures (<b>b</b>–<b>e</b>) represent the heatmaps of head features obtained from models trained with four different algorithms: YOLOv5, YOLOv5 + Transformer, YOLOv5 + Transformer + GAM, and TGC-YOLOv5, respectively, for the same input image.</p> "> Figure 6
<p>Frame of Transformer Encoder Block.</p> "> Figure 7
<p>Frame of Global Attention Mechanism.</p> "> Figure 8
<p>Frame of Channel Attention Mechanism.</p> "> Figure 9
<p>Frame of Spatial Attention Mechanism.</p> "> Figure 10
<p>Frame of Coordinate Attention Mechanism.</p> "> Figure 11
<p>Experimental Results on SUAV-DATA. (<b>a</b>) Ablation experiment results; (<b>b</b>) Parallel experiment results.</p> "> Figure 12
<p>AP and FLOPs corresponding to different algorithms. NOTE: The size of the circles is directly proportional to the corresponding algorithm’s parameter count.</p> "> Figure 13
<p>Comparison of the test results of the original YOLOv5 and TGC-YOLOv5 on the SUAV-DATA dataset.</p> "> Figure 14
<p>The four types of pollution (light, fog, stain, and saturation). NOTE: each column represents five levels of a specific type of pollution, recorded as 1, 2, 3, 4, and 5.</p> ">
Abstract
:1. Introduction
1.1. Methods Based on Traditional Image Processing
1.2. Methods Based on Deep Learning
1.3. This Work
- (1)
- We provide a small target dataset, SUAV-DATA, consisting of 10,000 images capturing small drones from different angles and under complex background conditions. Some targets in these images are occluded, and annotations are provided for all drones.
- (2)
- We introduce a Transformer encoder module into YOLOv5, enhancing the capability to detect local information.
- (3)
- We incorporate a global attention mechanism (GAM) to reduce message diffusion between different layers and amplify globally interactive features across dimensions. Additionally, we integrate a coordinate attention mechanism (CA) into the bottleneck part of C3, further enhancing the extraction capability of feature information for small targets.
2. Dataset
3. Framework
3.1. Overview of YOLOv5
3.2. TGC-YOLOv5
3.3. Transformer Encoder Block
3.4. Global Attention Mechanism
3.5. Coordinate Attention Mechanism
3.5.1. Coordinate Information Embedding
3.5.2. Coordinate Attention Generation
4. Results and Discussion
4.1. Determination of The TGC Method’s Position
4.2. Ablation and Parallel Experiments Results
4.3. Comparison of Detection Performance for Different Sizes of Drones
4.4. Experimental Results on Public Drone Datasets
4.5. Robustness Analysis
4.6. Comparison of Small Object Detection Algorithms
4.7. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Monika; Bansal, D.; Passi, A. Image Forgery Detection and Localization Using Block Based and Key-Point Based Feature Matching Forensic Investigation. Wirel. Pers. Commun. 2022, 127, 2823–2839. [Google Scholar] [CrossRef]
- Gangadharan, K.; Kumari, G.R.N.; Dhanasekaran, D.; Malathi, K. Automatic detection of plant disease and insect attack using effta algorithm. Int. J. Adv. Comput. Sci. Appl. 2020, 11. [Google Scholar] [CrossRef] [Green Version]
- Huynh, H.X.; Truong, B.Q.; Nguyen Thanh, K.T.; Truong, D.Q. Plant identification using new architecture convolutional neural networks combine with replacing the red of color channel image by vein morphology leaf. Vietnam J. Comput. Sci. 2020, 7, 197–208. [Google Scholar] [CrossRef]
- Zebari, D.A.; Zeebaree, D.Q.; Abdulazeez, A.M.; Haron, H.; Hamed, H.N.A. Improved threshold based and trainable fully automated segmentation for breast cancer boundary and pectoral muscle in mammogram images. IEEE Access 2020, 8, 203097–203116. [Google Scholar] [CrossRef]
- Srivastava, H.B.; Kumar, V.; Verma, H.; Sundaram, S. Image Pre-processing Algorithms for Detection of Small/Point Airborne Targets. Def. Sci. J. 2009, 59, 166–174. [Google Scholar] [CrossRef] [Green Version]
- Jie, W.; Feng, Z.; Wang, L. High Recognition Ratio Image Processing Algorithm of Micro Electrical Components in Optical Microscope. TELKOMNIKA (Telecommun. Comput. Electron. Control) 2014, 12, 911–920. [Google Scholar] [CrossRef] [Green Version]
- Saha, D. Development of Enhanced Weed Detection System with Adaptive Thresholding, K-Means and Support Vector Machine; South Dakota State University: Brookings, SD, USA, 2019. [Google Scholar]
- Kang, X.; Song, B.; Guo, J.; Du, X.; Guizani, M. A self-selective correlation ship tracking method for smart ocean systems. Sensors 2019, 19, 821. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tang, M.; Liang, K.; Qiu, J. Small insulator target detection based on multi-feature fusion. IET Image Process. 2023, 17, 1520–1533. [Google Scholar] [CrossRef]
- Nebili, B.; Khellal, A.; Nemra, A. Histogram encoding of sift based visual words for target recognition in infrared images. In Proceedings of the 2021 International Conference on Recent Advances in Mathematics and Informatics (ICRAMI), Tebessa, Algeria, 21–22 September 2021; pp. 1–6. [Google Scholar]
- Zhou, Y.; Tang, Y.; Zou, X.; Wu, M.; Tang, W.; Meng, F.; Zhang, Y.; Kang, H. Adaptive Active Positioning of Camellia oleifera Fruit Picking Points: Classical Image Processing and YOLOv7 Fusion Algorithm. Appl. Sci. 2022, 12, 12959. [Google Scholar] [CrossRef]
- Khalid, S.; Oqaibi, H.M.; Aqib, M.; Hafeez, Y. Small Pests Detection in Field Crops Using Deep Learning Object Detection. Sustainability 2023, 15, 6815. [Google Scholar] [CrossRef]
- Chu, Q.; Ouyang, W.; Li, H.; Wang, X.; Liu, B.; Yu, N. Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In Proceedings of the IEEE International Conference on Computer Vision, Cambridge, MA, USA, 20–23 June 1995; pp. 4836–4845. [Google Scholar]
- Xu, S.; Savvaris, A.; He, S.; Shin, H.-s.; Tsourdos, A. Real-time implementation of YOLO+ JPDA for small scale UAV multiple object tracking. In Proceedings of the 2018 International Conference on Unmanned Aircraft Systems (ICUAS), Dallas, TX, USA, 12–15 June 2018; pp. 1336–1341. [Google Scholar]
- Li, J.; Liang, X.; Wei, Y.; Xu, T.; Feng, J.; Yan, S. Perceptual generative adversarial networks for small object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Juan, PR, USA, 17–19 June 1997; pp. 1222–1230. [Google Scholar]
- Cao, G.; Xie, X.; Yang, W.; Liao, Q.; Shi, G.; Wu, J. Feature-fused SSD: Fast detection for small objects. In Proceedings of the Ninth International Conference on Graphic and Image Processing (ICGIP 2017), Qingdao, China, 14–16 October 2017; pp. 381–388. [Google Scholar]
- Liang, H.; Yang, J.; Shao, M. FE-RetinaNet: Small Target Detection with Parallel Multi-Scale Feature Enhancement. Symmetry 2021, 13, 950. [Google Scholar] [CrossRef]
- Luo, X.; Wu, Y.; Wang, F. Target detection method of UAV aerial imagery based on improved YOLOv5. Remote Sens. 2022, 14, 5063. [Google Scholar] [CrossRef]
- Nath, V.; Chattopadhyay, C.; Desai, K. On enhancing prediction abilities of vision-based metallic surface defect classification through adversarial training. Eng. Appl. Artif. Intell. 2023, 117, 105553. [Google Scholar] [CrossRef]
- Tang, Y.; Huang, Z.; Chen, Z.; Chen, M.; Zhou, H.; Zhang, H.; Sun, J. Novel visual crack width measurement based on backbone double-scale features for improved detection automation. Eng. Struct. 2023, 274, 115158. [Google Scholar] [CrossRef]
- Que, Y.; Dai, Y.; Ji, X.; Leung, A.K.; Chen, Z.; Tang, Y.; Jiang, Z. Automatic classification of asphalt pavement cracks using a novel integrated generative adversarial networks and improved VGG model. Eng. Struct. 2023, 277, 115406. [Google Scholar] [CrossRef]
- He, H.; Chen, Q.; Xie, G.; Yang, B.; Li, S.; Zhou, B.; Gu, Y. A Lightweight Deep Learning Model for Real-time Detection and Recognition of Traffic Signs Images Based on YOLOv5. In Proceedings of the 2022 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), Suzhou, China, 17–18 November 2022; pp. 206–212. [Google Scholar]
- Wei, J.; Wang, Q.; Song, X.; Zhao, Z. The Status and Challenges of Image Data Augmentation Algorithms. J. Phys. Conf. Ser. 2023, 2456, 012041. [Google Scholar] [CrossRef]
- Chen, C.; Liu, M.-Y.; Tuzel, O.; Xiao, J. R-CNN for small object detection. In Proceedings of the Computer Vision–ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan, 20–24 November 2016; pp. 214–230. [Google Scholar]
- Huang, Z.; Wang, F.; You, H.; Hu, Y. STC-Det: A Slender Target Detector Combining Shadow and Target Information in Optical Satellite Images. Remote Sens. 2021, 13, 4183. [Google Scholar] [CrossRef]
- Ju, M.; Luo, J.; Zhang, P.; He, M.; Luo, H. A simple and efficient network for small target detection. IEEE Access 2019, 7, 85771–85781. [Google Scholar] [CrossRef]
- Liu, S.; Wu, R.; Qu, J.; Li, Y. HPN-SOE: Infrared Small Target Detection and Identification Algorithm Based on Heterogeneous Parallel Networks with Similarity Object Enhancement. IEEE Sens. J. 2023, 23, 13797–13809. [Google Scholar] [CrossRef]
- Zhan, J.; Hu, Y.; Cai, W.; Zhou, G.; Li, L. PDAM–STPNNet: A small target detection approach for wildland fire smoke through remote sensing images. Symmetry 2021, 13, 2260. [Google Scholar] [CrossRef]
- Chen, J.; Hong, H.; Song, B.; Guo, J.; Chen, C.; Xu, J. MDCT: Multi-Kernel Dilated Convolution and Transformer for One-Stage Object Detection of Remote Sensing Images. Remote Sens. 2023, 15, 371. [Google Scholar] [CrossRef]
- Li, W.; Wang, Q.; Gao, S. PF-YOLOv4-Tiny: Towards Infrared Target Detection on Embedded Platform. Intell. Autom. Soft Comput. 2023, 37, 921–938. [Google Scholar] [CrossRef]
- Chen, L.; Yang, Y.; Wang, Z.; Zhang, J.; Zhou, S.; Wu, L. Underwater Target Detection Lightweight Algorithm Based on Multi-Scale Feature Fusion. J. Mar. Sci. Eng. 2023, 11, 320. [Google Scholar] [CrossRef]
- Li, X.; Diao, W.; Mao, Y.; Gao, P.; Mao, X.; Li, X.; Sun, X. OGMN: Occlusion-guided multi-task network for object detection in UAV images. ISPRS J. Photogramm. Remote Sens. 2023, 199, 242–257. [Google Scholar] [CrossRef]
- Liu, X.; Wang, C.; Liu, L. Research on pedestrian detection model and compression technology for UAV images. Sensors 2022, 22, 9171. [Google Scholar] [CrossRef]
- Shen, Y.; Liu, D.; Zhang, F.; Zhang, Q. Fast and accurate multi-class geospatial object detection with large-size remote sensing imagery using CNN and Truncated NMS. ISPRS J. Photogramm. Remote Sens. 2022, 191, 235–249. [Google Scholar] [CrossRef]
- Xu, X.; Zhao, S.; Xu, C.; Wang, Z.; Zheng, Y.; Qian, X.; Bao, H. Intelligent Mining Road Object Detection Based on Multiscale Feature Fusion in Multi-UAV Networks. Drones 2023, 7, 250. [Google Scholar] [CrossRef]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1314–1324. [Google Scholar]
- Wang, H.; Xu, Y.; He, Y.; Cai, Y.; Chen, L.; Li, Y.; Sotelo, M.A.; Li, Z. YOLOv5-Fog: A multiobjective visual detection algorithm for fog driving scenes based on improved YOLOv5. IEEE Trans. Instrum. Meas. 2022, 71, 1–12. [Google Scholar] [CrossRef]
- Dai, G.; Hu, L.; Fan, J.; Yan, S.; Li, R. A Deep Learning-Based Object Detection Scheme by Improving YOLOv5 for Sprouted Potatoes Datasets. IEEE Access 2022, 10, 85416–85428. [Google Scholar] [CrossRef]
- Wang, L.; Cao, Y.; Wang, S.; Song, X.; Zhang, S.; Zhang, J.; Niu, J. Investigation into recognition algorithm of Helmet violation based on YOLOv5-CBAM-DCN. IEEE Access 2022, 10, 60622–60632. [Google Scholar] [CrossRef]
- Gao, S.-H.; Cheng, M.-M.; Zhao, K.; Zhang, X.-Y.; Yang, M.-H.; Torr, P. Res2net: A new multi-scale backbone architecture. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 652–662. [Google Scholar] [CrossRef] [Green Version]
- Wang, Z.; Zhang, H.; Lin, Z.; Tan, X.; Zhou, B. Prohibited Items Detection in Baggage Security Based on Improved YOLOv5. In Proceedings of the 2022 IEEE 2nd International Conference on Software Engineering and Artificial Intelligence (SEAI), Xiamen, China, 16–18 June 2022; pp. 20–25. [Google Scholar]
- Yang, R.; Li, W.; Shang, X.; Zhu, D.; Man, X. KPE-YOLOv5: An Improved Small Target Detection Algorithm Based on YOLOv5. Electronics 2023, 12, 817. [Google Scholar] [CrossRef]
- Hong, W.; Ma, Z.; Ye, B.; Yu, G.; Tang, T.; Zheng, M. Detection of Green Asparagus in Complex Environments Based on the Improved YOLOv5 Algorithm. Sensors 2023, 23, 1562. [Google Scholar] [CrossRef] [PubMed]
- Gong, H.; Mu, T.; Li, Q.; Dai, H.; Li, C.; He, Z.; Wang, W.; Han, F.; Tuniyazi, A.; Li, H. Swin-Transformer-Enabled YOLOv5 with Attention Mechanism for Small Object Detection on Satellite Images. Remote Sens. 2022, 14, 2861. [Google Scholar] [CrossRef]
- Xiao, Z.; Sun, E.; Yuan, F.; Peng, J.; Liu, J. Detection Method of Damaged Camellia Oleifera Seeds Based on YOLOv5-CB. IEEE Access 2022, 10, 126133–126141. [Google Scholar] [CrossRef]
- Ren, J.; Wang, Z.; Zhang, Y.; Liao, L. YOLOv5-R: Lightweight real-time detection based on improved YOLOv5. J. Electron. Imaging 2022, 31, 033033. [Google Scholar] [CrossRef]
- Qi, J.; Liu, X.; Liu, K.; Xu, F.; Guo, H.; Tian, X.; Li, M.; Bao, Z.; Li, Y. An improved YOLOv5 model based on visual attention mechanism: Application to recognition of tomato virus disease. Comput. Electron. Agric. 2022, 194, 106780. [Google Scholar] [CrossRef]
- Zhu, Y.; Li, S.; Du, W.; Du, Y.; Liu, P.; Li, X. Identification of table grapes in the natural environment based on an improved Yolov5 and localization of picking points. Precis. Agric. 2023, 24, 1333–1354. [Google Scholar] [CrossRef]
- Li, Y.; Bai, X.; Xia, C. An Improved YOLOV5 Based on Triplet Attention and Prediction Head Optimization for Marine Organism Detection on Underwater Mobile Platforms. J. Mar. Sci. Eng. 2022, 10, 1230. [Google Scholar] [CrossRef]
- Dai, J.; Zhang, X. Automatic image caption generation using deep learning and multimodal attention. Comput. Animat. Virtual Worlds 2022, 33, e2072. [Google Scholar] [CrossRef]
- Pawełczyk, M.; Wojtyra, M. Real world object detection dataset for quadcopter unmanned aerial vehicle detection. IEEE Access 2020, 8, 174394–174409. [Google Scholar] [CrossRef]
- Zheng, Y.; Chen, Z.; Lv, D.; Li, Z.; Lan, Z.; Zhao, S. Air-to-air visual detection of micro-uavs: An experimental evaluation of deep learning. IEEE Robot. Autom. Lett. 2021, 6, 1020–1027. [Google Scholar] [CrossRef]
- Walter, V.; Vrba, M.; Saska, M. On training datasets for machine learning-based visual relative localization of micro-scale UAVs. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 10674–10680. [Google Scholar]
- Chen, Y.; Aggarwal, P.; Choi, J.; Kuo, C.-C.J. A deep learning approach to drone monitoring. In Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, Malaysia, 12–15 December 2017; pp. 686–691. [Google Scholar]
- Torralba, A.; Fergus, R.; Freeman, W.T. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 1958–1970. [Google Scholar] [CrossRef] [PubMed]
- Dong, Z.; Wang, M.; Wang, Y.; Zhu, Y.; Zhang, Z. Object detection in high resolution remote sensing imagery based on convolutional neural networks with suitable object scale features. IEEE Trans. Geosci. Remote Sens. 2019, 58, 2104–2114. [Google Scholar] [CrossRef]
- Jocher, G.; Stoken, A.; Borovec, J.; Chaurasia, A.; Changyu, L.; Hogan, A.; Hajek, J.; Diaconu, L.; Kwon, Y.; Defretin, Y. ultralytics/yolov5: v5. 0-YOLOv5-P6 1280 models, AWS, Supervise. ly and YouTube integrations. Zenodo 2021. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Liu, Y.; Shao, Z.; Hoffmann, N. Global Attention Mechanism: Retain Information to Enhance Channel-Spatial Interactions. arXiv 2021, arXiv:2112.05561. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 3–19. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
- Kopuklu, O.; Kose, N.; Gunduz, A.; Rigoll, G. Resource efficient 3d convolutional neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Montreal, BC, Canada, 11–17 October 2021. [Google Scholar] [CrossRef] [Green Version]
- Cao, J.; Zhang, J.; Huang, W. Traffic sign detection and recognition using multi-scale fusion and prime sample attention. IEEE Access 2020, 9, 3579–3591. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
- Zhang, R.; Shao, Z.; Huang, X.; Wang, J.; Wang, Y.; Li, D. Adaptive dense pyramid network for object detection in UAV imagery. Neurocomputing 2022, 489, 377–389. [Google Scholar] [CrossRef]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Juan, PR, USA, 17–19 June 1997; pp. 6154–6162. [Google Scholar]
- Chalavadi, V.; Jeripothula, P.; Datla, R.; Ch, S.B. mSODANet: A network for multi-scale object detection in aerial images using hierarchical dilated convolutions. Pattern Recognit. 2022, 126, 108548. [Google Scholar] [CrossRef]
- Benz, U.C.; Hofmann, P.; Willhauck, G.; Lingenfelder, I.; Heynen, M. Isprs Journal of Photogrammetry and Remote Sensing. Sci. Technol. Prog. Policy 2004, 58, 239–258. [Google Scholar]
- Zhang, R.; Shao, Z.; Huang, X.; Wang, J.; Li, D. Object detection in UAV images via global density fused convolutional network. Remote Sens. 2020, 12, 3140. [Google Scholar] [CrossRef]
Model | Precision | Recall | mAP | GFLOPs | Params(M) |
---|---|---|---|---|---|
TGC_C3-1 | 0.926 | 0.811 | 0.827 | 14.0 | 16.3 |
TGC_C3-2 | 0.934 | 0.817 | 0.832 | 18.5 | 17.2 |
TGC_C3-3 | 0.939 | 0.823 | 0.848 | 13.4 | 19.7 |
Model | Precision | Recall | mAP | FPS | GFLOPs | Params (M) |
---|---|---|---|---|---|---|
Faster_rcnn-r50_fpn | 0.801 | 0.588 | 0.801 | 31.5 | 210.6 | 43.8 |
RetinaNet | 0.820 | 0.625 | 0.820 | 55.4 | 205.5 | 36.2 |
SSD 300 | 0.767 | 0.557 | 0.767 | 47.5 | 384.7 | 33.6 |
YOLOv5s | 0.938 | 0.785 | 0.823 | 84.3 | 15.8 | 7.0 |
YOLOv5s- Transformer | 0.940 | 0.790 | 0.833 | 77.1 | 16.0 | 8.5 |
Y-T-CBAM | 0.925 | 0.785 | 0.831 | 83.7 | 14.3 | 9.1 |
Y-T-SE | 0.903 | 0.799 | 0.830 | 75.4 | 16.5 | 13.3 |
Y-T-NAM | 0.922 | 0.794 | 0.835 | 81.2 | 15.6 | 17.6 |
Y-T-CA | 0.934 | 0.788 | 0.835 | 72.9 | 17.1 | 8.1 |
Y-T-GAM | 0.948 | 0.790 | 0.837 | 85 | 14 | 20 |
TGC-YOLOv5(Ours) | 0.939 | 0.823 | 0.848 | 86.5 | 13.4 | 19.7 |
Model | Small Drones | Medium Drones | Large Drones | ||||||
---|---|---|---|---|---|---|---|---|---|
Precision | Recall | mAP | Precision | Recall | mAP | Precision | Recall | mAP | |
YOLOv5s | 0.891 | 0.765 | 0.847 | 0.943 | 0.907 | 0.95 | 0.943 | 0.928 | 0.953 |
TGC-YOLOv5 | 0.902 | 0.817 | 0.88 | 0.976 | 0.915 | 0.965 | 0.935 | 0.947 | 0.962 |
Dataset | Model | Precision | Recall | mAP |
---|---|---|---|---|
Real-World | YOLOv5s | 0.957 | 0.919 | 0.966 |
TGC-YOLOv5 | 0.959 | 0.936 | 0.975 | |
Drone-dataset | YOLOv5s | 0.928 | 0.905 | 0.937 |
TGC-YOLOv5 | 0.933 | 0.916 | 0.951 |
Model | SSD 300 | Faster_rcnn-r50_fpn | RetinaNet | YOLOv5s | TGC-YOLOv5 (Ours) | |
---|---|---|---|---|---|---|
Light | 1 | 0.851 | 0.856 | 0.860 | 0.863 | 0.870 |
2 | 0.845 | 0.850 | 0.854 | 0.859 | 0.862 | |
3 | 0.828 | 0.830 | 0.834 | 0.837 | 0.841 | |
4 | 0.803 | 0.811 | 0.818 | 0.822 | 0.830 | |
5 | 0.781 | 0.794 | 0.799 | 0.804 | 0.807 | |
Fog | 1 | 0.843 | 0.847 | 0.852 | 0.861 | 0.865 |
2 | 0.839 | 0.842 | 0.846 | 0.853 | 0.857 | |
3 | 0.826 | 0.829 | 0.831 | 0.833 | 0.836 | |
4 | 0.814 | 0.817 | 0.820 | 0.824 | 0.832 | |
5 | 0.813 | 0.816 | 0.814 | 0.819 | 0.823 | |
Stain | 1 | 0.848 | 0.852 | 0.854 | 0.860 | 0.867 |
2 | 0.842 | 0.847 | 0.852 | 0.863 | 0.865 | |
3 | 0.834 | 0.833 | 0.835 | 0.838 | 0.844 | |
4 | 0.735 | 0.739 | 0.742 | 0.748 | 0.752 | |
5 | 0.678 | 0.674 | 0.681 | 0.686 | 0.688 | |
Saturation | 1 | 0.859 | 0.862 | 0.865 | 0.867 | 0.872 |
2 | 0.846 | 0.853 | 0.856 | 0.862 | 0.868 | |
3 | 0.836 | 0.839 | 0.841 | 0.843 | 0.847 | |
4 | 0.825 | 0.828 | 0.832 | 0.836 | 0.842 | |
5 | 0.805 | 0.811 | 0.817 | 0.820 | 0.824 |
Model | mAP50 | mAP | GFLOPs | Params (M) |
---|---|---|---|---|
Faster-RCNN [64] | 0.310 | 0.172 | 118.8 | 41.2 |
Cascade ADPN [65] | 0.387 | 0.228 | 547.2 | 90.8 |
Cascade-RCNN [66] | 0.388 | 0.226 | 146.6 | 69.0 |
mSODANet [67] | 0.559 | 0.369 | 10.6 | 22.0 |
AdNet-SS [68] | 0.579 | 0.311 | 32.8 | 77.2 |
YOLOv5s | 0.537 | 0.317 | 16.3 | 7.04 |
YOLOv5m | 0.586 | 0.354 | 48.2 | 20.9 |
RetinaNet [68] | 0.443 | 0.227 | 35.7 | 36.4 |
Grid GDF [69] | 0.308 | 0.182 | 257.6 | 72.0 |
SABL [68] | 0.412 | 0.250 | 145.5 | 99.6 |
YOLOX-s | 0.535 | 0.314 | 26.8 | 9.0 |
This work | 0.597 | 0.385 | 13.4 | 19.7 |
Data | Model | Precision | Recall | mAP | GFLOPs | Params(M) |
---|---|---|---|---|---|---|
Real Data | YOLOv5s | 0.923 | 0.865 | 0.925 | 15.8 | 7.0 |
TGC-YOLOv5 | 0.938 | 0.886 | 0.936 | 19.5 | 13.5 | |
Synthetic Data | YOLOv5s | 0.927 | 0.872 | 0.932 | 15.8 | 7.0 |
TGC-YOLOv5 | 0.946 | 0.877 | 0.945 | 19.5 | 13.5 |
Model | Precision | Recall | mAP | GFLOPs | Params(M) |
---|---|---|---|---|---|
YOLOv8s | 0.942 | 0.835 | 0.839 | 28.4 | 11.1 |
TGC-YOLOv8 | 0.957 | 0.848 | 0.86 | 27.7 | 21.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, Y.; Ju, Z.; Sun, T.; Dong, F.; Li, J.; Yang, R.; Fu, Q.; Lian, C.; Shan, P. TGC-YOLOv5: An Enhanced YOLOv5 Drone Detection Model Based on Transformer, GAM & CA Attention Mechanism. Drones 2023, 7, 446. https://doi.org/10.3390/drones7070446
Zhao Y, Ju Z, Sun T, Dong F, Li J, Yang R, Fu Q, Lian C, Shan P. TGC-YOLOv5: An Enhanced YOLOv5 Drone Detection Model Based on Transformer, GAM & CA Attention Mechanism. Drones. 2023; 7(7):446. https://doi.org/10.3390/drones7070446
Chicago/Turabian StyleZhao, Yuliang, Zhongjie Ju, Tianang Sun, Fanghecong Dong, Jian Li, Ruige Yang, Qiang Fu, Chao Lian, and Peng Shan. 2023. "TGC-YOLOv5: An Enhanced YOLOv5 Drone Detection Model Based on Transformer, GAM & CA Attention Mechanism" Drones 7, no. 7: 446. https://doi.org/10.3390/drones7070446
APA StyleZhao, Y., Ju, Z., Sun, T., Dong, F., Li, J., Yang, R., Fu, Q., Lian, C., & Shan, P. (2023). TGC-YOLOv5: An Enhanced YOLOv5 Drone Detection Model Based on Transformer, GAM & CA Attention Mechanism. Drones, 7(7), 446. https://doi.org/10.3390/drones7070446