Small target detection in drone aerial images based on feature fusion

Aiming Mu¹,
Huajun Wang¹,
Wenjie Meng¹ &
…
Yufeng Chen¹

441 Accesses
Explore all metrics

Abstract

The use of object detection technology in unmanned aerial vehicles is a crucial area of research in computer vision. Aerial images captured by drones exhibit differences in object shape and size compared to traditional images, which can cause object detection algorithms to miss or misidentify small targets. This paper makes improvements based on the YOLOv5 algorithm. The algorithm introduces a small target detection layer to improve the model’s detection capability at different scales. Cross-channel fusion module and multi-level feature fusion downsampling module are added to obtain more comprehensive context information. This makes the network pay more attention to the important features of small targets. Additionally, the classification task and regression task of the detection head are decoupled to speed up the model’s convergence and improve detection accuracy. Finally, a new loss function is proposed to further improve the accuracy and convergence rate of the detector. The algorithm is evaluated on the VisDrone2019 dataset and compared with the YOLOv5s algorithm. The results show an improvement of 4.7% in mAP0.5, 3.0% in mAP0.5:0.95, 3.6% in precision, and 6.4% in recall. At the same time, the algorithm was evaluated on the DIOR dataset, and mAP0.5:0.95 improved by 1.5%.These findings demonstrate the algorithm’s effectiveness in detecting small targets in aerial images captured by drones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 3

Fig. 4

LGFF-YOLO: small object detection method of UAV images based on efficient local–global feature fusion

Article 06 September 2024

YOLOv5-LW: Lightweight UAV Object Detection Algorithm Based on YOLOv5

Modified YOLOv5 for small target detection in aerial images

Article 16 November 2023

Data availability

Visdrone2019 and DIOR datasets can be found in References 37, 38. The VisDrone2019 dataset was collected by a team from Tianjin University’s Machine Learning and Data Mining Lab AISKYEYE, and the entire baseline dataset was captured by drone. DIOR is a dataset for rotating target detection, jointly published by the Institute of Automation of the Chinese Academy of Sciences and Dahua Technology.

References

Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014). https://doi.org/10.48550/arXiv.1311.2524
Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision, pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Article Google Scholar
Liu, W., Anguelov, D., Erhan, D., et al.: SSD: single shot MultiBox detector. In: Computer vision-ECCV, pp. 21–37 (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Redmo, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017). https://doi.org/10.1109/CVPR.2017.690
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Bochkovskiy, A., Wang, C.Y., et al.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Glenn, J.: YOLOv5 release v6.0. https://github.com/ultralytics/yolov5/releases/tag/v6.0. Accessed 26 June 2023 (2022)
C, Li., L, Li., H, Jiang., et al.: YOLOv6: a single-stage object detection framework for industrial applications (2022). arXiv preprint arXiv:2209.02976
Wang, C., Bochkovskiy, A., et al.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023). https://doi.org/10.1109/CVPR52729.2023.00721
Lin, T., Maire, M., Belongie, S., et al.: Microsoft COCO: common objects in context. Comput. Vis. ECCV 2014, 740–755 (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Article Google Scholar
Zhang, R., Shao, Z., Huang, X., et al.: Object detection in UAV images via global density fused convolutional network. Remote Sens. 12(19), 3140 (2020). https://doi.org/10.3390/rs12193140
Article Google Scholar
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint (2015). arXiv:1511.07122
Liu, S., Zha, J., Sun, J. l.: EdgeYOLO: an edge-real-time object detector. In: 2023 42nd Chinese Control Conference, pp. 7507–7512 (2023). https://doi.org/10.23919/CCC58697.2023.10239786
Zhou, L., Liu, Z., Zhao, H., et al.: A multi-scale object detector based on coordinate and global information aggregation for UAV aerial images. Remote Sens. 15(14), 3468 (2023). https://doi.org/10.3390/rs15143468
Article Google Scholar
Yu, W., Yang, T., Chen, C.: Towards resolving the challenge of long-tail distribution in UAV images for object detection. In: 2021 IEEE Winter Conference on Applications of Computer Vision, pp. 3257–3266 (2021). https://doi.org/10.1109/WACV48630.2021.00330
Tan, M., Pang, R., Le, Q., et al.: EfficientDet: scalable and efficient object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10778–10787. https://doi.org/10.1109/CVPR42600.2020.01079
Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. In: Proceedings of the European Conference on Computer Vision, pp. 385–400 (2018). https://doi.org/10.1007/978-3-030-01252-6_24
Song, G., Liu, Y., Wang, X.: Revisiting the sibling head in object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11560–11569 (2020). https://doi.org/10.1109/CVPR42600.2020.01158
Ge, Z., Liu, S., Wang, F., et al.: YOLOX: Exceeding yolo series in 2021 (2021). arXiv preprint arXiv:2107.08430
Zhu, X., Lyu, S., Wang, X., et al.: TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2778–2788 (2021). https://doi.org/10.1109/ICCVW54120.2021.00312
Huang, R., Pedoeem, J., Chen, C., et al.: YOLO-LITE: a real-time object detection algorithm optimized for non-GPU computers. In: 2018 IEEE International Conference on Big Data, pp. 2503–2510 (2018). https://doi.org/10.1109/BigData.2018.8621865
Lin, T., Dolláir, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017). https://doi.org/10.1109/CVPR.2017.106
Liu, S., Qi, L., Qin, H. et al.: Path aggregation network for instance segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018). https://doi.org/10.1109/CVPR.2018.00913
He, K., Zhang, X., Ren, S., Sun J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Yu, J., Jiang, Y., Wang, Z., et al.: UnitBox: an advanced object detection network. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 516–520. (2016) https://doi.org/10.1145/2964284.2967274
Zheng, Z., Wang, P., Liu, W., et al.: Distance-IoU loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12993–13000 (2020). https://doi.org/10.1609/aaai.v34i07.6999
Rezatofighi, H., Tsoi, N., Gwak, J., et al.: Generalized Intersection over union: a metric and a loss for bounding box regression. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019). https://doi.org/10.1109/CVPR.2019.00075
Zhang, H., Wang, Y., Dayoub, F.: VarifocalNet: an IoU-aware dense object detector. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8510–8519 (2021). https://doi.org/10.1109/CVPR46437.2021.00841
Shao, Z., Lyu, H., Yin, Y., Cheng, T., et al.: Multi-scale object detection model for autonomous ship navigation in maritime environment. J. Mar. Sci. Eng. 10(11), 1783 (2022). https://doi.org/10.3390/jmse10111783
Article Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). https://doi.org/10.5555/3045118.3045167
Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018). https://doi.org/10.1016/j.neunet.2017.12.012
Article Google Scholar
Srinivas, A., Lin, T., Parmar, N. et al.: Bottleneck transformers for visual recognition. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16514–16524 (2021). https://doi.org/10.1109/CVPR46437.2021.01625
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745
Xavier, G., Antoine, B., Yoshua, B.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, vol. 15, pp. 315–323 (2011)
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1800–1807 (2017). https://doi.org/10.1109/CVPR.2017.195
Du, D., Zhu, P. et al.: (2019) VisDrone-DET2019: the vision meets drone object detection in image challenge results. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop, pp. 213–226. https://doi.org/10.1109/ICCVW.2019.00030
Guo, H., Bai, H., Yuan, Y., et al.: Fully deformable convolutional network for ship detection in remote sensing imagery. Remote Sens. 14(8), 1850 (2022). https://doi.org/10.3390/rs14081850
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Cyber Security, Chengdu University of Technology, Chengdu, 610059, Sichuan, China
Aiming Mu, Huajun Wang, Wenjie Meng & Yufeng Chen

Authors

Aiming Mu
View author publications
You can also search for this author in PubMed Google Scholar
Huajun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenjie Meng
View author publications
You can also search for this author in PubMed Google Scholar
Yufeng Chen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MA: Data analysis and Writing. WH: Formal analysis. MW: Validation. CY: Methodology.

Corresponding author

Correspondence to Huajun Wang.

Ethics declarations

Conflict of interest

This study does not have conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mu, A., Wang, H., Meng, W. et al. Small target detection in drone aerial images based on feature fusion. SIViP 18 (Suppl 1), 585–598 (2024). https://doi.org/10.1007/s11760-024-03176-3

Download citation

Received: 02 March 2024
Revised: 21 March 2024
Accepted: 22 March 2024
Published: 15 April 2024
Issue Date: August 2024
DOI: https://doi.org/10.1007/s11760-024-03176-3

Small target detection in drone aerial images based on feature fusion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

LGFF-YOLO: small object detection method of UAV images based on efficient local–global feature fusion

YOLOv5-LW: Lightweight UAV Object Detection Algorithm Based on YOLOv5

Modified YOLOv5 for small target detection in aerial images

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Small target detection in drone aerial images based on feature fusion

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

LGFF-YOLO: small object detection method of UAV images based on efficient local–global feature fusion

YOLOv5-LW: Lightweight UAV Object Detection Algorithm Based on YOLOv5

Modified YOLOv5 for small target detection in aerial images

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation