Abstract
The use of object detection technology in unmanned aerial vehicles is a crucial area of research in computer vision. Aerial images captured by drones exhibit differences in object shape and size compared to traditional images, which can cause object detection algorithms to miss or misidentify small targets. This paper makes improvements based on the YOLOv5 algorithm. The algorithm introduces a small target detection layer to improve the model’s detection capability at different scales. Cross-channel fusion module and multi-level feature fusion downsampling module are added to obtain more comprehensive context information. This makes the network pay more attention to the important features of small targets. Additionally, the classification task and regression task of the detection head are decoupled to speed up the model’s convergence and improve detection accuracy. Finally, a new loss function is proposed to further improve the accuracy and convergence rate of the detector. The algorithm is evaluated on the VisDrone2019 dataset and compared with the YOLOv5s algorithm. The results show an improvement of 4.7% in mAP0.5, 3.0% in mAP0.5:0.95, 3.6% in precision, and 6.4% in recall. At the same time, the algorithm was evaluated on the DIOR dataset, and mAP0.5:0.95 improved by 1.5%.These findings demonstrate the algorithm’s effectiveness in detecting small targets in aerial images captured by drones.
Similar content being viewed by others
Data availability
Visdrone2019 and DIOR datasets can be found in References 37, 38. The VisDrone2019 dataset was collected by a team from Tianjin University’s Machine Learning and Data Mining Lab AISKYEYE, and the entire baseline dataset was captured by drone. DIOR is a dataset for rotating target detection, jointly published by the Institute of Automation of the Chinese Academy of Sciences and Dahua Technology.
References
Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014). https://doi.org/10.48550/arXiv.1311.2524
Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision, pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Liu, W., Anguelov, D., Erhan, D., et al.: SSD: single shot MultiBox detector. In: Computer vision-ECCV, pp. 21–37 (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Redmo, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017). https://doi.org/10.1109/CVPR.2017.690
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Bochkovskiy, A., Wang, C.Y., et al.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Glenn, J.: YOLOv5 release v6.0. https://github.com/ultralytics/yolov5/releases/tag/v6.0. Accessed 26 June 2023 (2022)
C, Li., L, Li., H, Jiang., et al.: YOLOv6: a single-stage object detection framework for industrial applications (2022). arXiv preprint arXiv:2209.02976
Wang, C., Bochkovskiy, A., et al.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023). https://doi.org/10.1109/CVPR52729.2023.00721
Lin, T., Maire, M., Belongie, S., et al.: Microsoft COCO: common objects in context. Comput. Vis. ECCV 2014, 740–755 (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Zhang, R., Shao, Z., Huang, X., et al.: Object detection in UAV images via global density fused convolutional network. Remote Sens. 12(19), 3140 (2020). https://doi.org/10.3390/rs12193140
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint (2015). arXiv:1511.07122
Liu, S., Zha, J., Sun, J. l.: EdgeYOLO: an edge-real-time object detector. In: 2023 42nd Chinese Control Conference, pp. 7507–7512 (2023). https://doi.org/10.23919/CCC58697.2023.10239786
Zhou, L., Liu, Z., Zhao, H., et al.: A multi-scale object detector based on coordinate and global information aggregation for UAV aerial images. Remote Sens. 15(14), 3468 (2023). https://doi.org/10.3390/rs15143468
Yu, W., Yang, T., Chen, C.: Towards resolving the challenge of long-tail distribution in UAV images for object detection. In: 2021 IEEE Winter Conference on Applications of Computer Vision, pp. 3257–3266 (2021). https://doi.org/10.1109/WACV48630.2021.00330
Tan, M., Pang, R., Le, Q., et al.: EfficientDet: scalable and efficient object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10778–10787. https://doi.org/10.1109/CVPR42600.2020.01079
Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. In: Proceedings of the European Conference on Computer Vision, pp. 385–400 (2018). https://doi.org/10.1007/978-3-030-01252-6_24
Song, G., Liu, Y., Wang, X.: Revisiting the sibling head in object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11560–11569 (2020). https://doi.org/10.1109/CVPR42600.2020.01158
Ge, Z., Liu, S., Wang, F., et al.: YOLOX: Exceeding yolo series in 2021 (2021). arXiv preprint arXiv:2107.08430
Zhu, X., Lyu, S., Wang, X., et al.: TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2778–2788 (2021). https://doi.org/10.1109/ICCVW54120.2021.00312
Huang, R., Pedoeem, J., Chen, C., et al.: YOLO-LITE: a real-time object detection algorithm optimized for non-GPU computers. In: 2018 IEEE International Conference on Big Data, pp. 2503–2510 (2018). https://doi.org/10.1109/BigData.2018.8621865
Lin, T., Dolláir, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017). https://doi.org/10.1109/CVPR.2017.106
Liu, S., Qi, L., Qin, H. et al.: Path aggregation network for instance segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018). https://doi.org/10.1109/CVPR.2018.00913
He, K., Zhang, X., Ren, S., Sun J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Yu, J., Jiang, Y., Wang, Z., et al.: UnitBox: an advanced object detection network. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 516–520. (2016) https://doi.org/10.1145/2964284.2967274
Zheng, Z., Wang, P., Liu, W., et al.: Distance-IoU loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12993–13000 (2020). https://doi.org/10.1609/aaai.v34i07.6999
Rezatofighi, H., Tsoi, N., Gwak, J., et al.: Generalized Intersection over union: a metric and a loss for bounding box regression. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019). https://doi.org/10.1109/CVPR.2019.00075
Zhang, H., Wang, Y., Dayoub, F.: VarifocalNet: an IoU-aware dense object detector. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8510–8519 (2021). https://doi.org/10.1109/CVPR46437.2021.00841
Shao, Z., Lyu, H., Yin, Y., Cheng, T., et al.: Multi-scale object detection model for autonomous ship navigation in maritime environment. J. Mar. Sci. Eng. 10(11), 1783 (2022). https://doi.org/10.3390/jmse10111783
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). https://doi.org/10.5555/3045118.3045167
Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018). https://doi.org/10.1016/j.neunet.2017.12.012
Srinivas, A., Lin, T., Parmar, N. et al.: Bottleneck transformers for visual recognition. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16514–16524 (2021). https://doi.org/10.1109/CVPR46437.2021.01625
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745
Xavier, G., Antoine, B., Yoshua, B.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, vol. 15, pp. 315–323 (2011)
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1800–1807 (2017). https://doi.org/10.1109/CVPR.2017.195
Du, D., Zhu, P. et al.: (2019) VisDrone-DET2019: the vision meets drone object detection in image challenge results. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop, pp. 213–226. https://doi.org/10.1109/ICCVW.2019.00030
Guo, H., Bai, H., Yuan, Y., et al.: Fully deformable convolutional network for ship detection in remote sensing imagery. Remote Sens. 14(8), 1850 (2022). https://doi.org/10.3390/rs14081850
Author information
Authors and Affiliations
Contributions
MA: Data analysis and Writing. WH: Formal analysis. MW: Validation. CY: Methodology.
Corresponding author
Ethics declarations
Conflict of interest
This study does not have conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mu, A., Wang, H., Meng, W. et al. Small target detection in drone aerial images based on feature fusion. SIViP 18 (Suppl 1), 585–598 (2024). https://doi.org/10.1007/s11760-024-03176-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-024-03176-3