[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Small target detection in drone aerial images based on feature fusion

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

The use of object detection technology in unmanned aerial vehicles is a crucial area of research in computer vision. Aerial images captured by drones exhibit differences in object shape and size compared to traditional images, which can cause object detection algorithms to miss or misidentify small targets. This paper makes improvements based on the YOLOv5 algorithm. The algorithm introduces a small target detection layer to improve the model’s detection capability at different scales. Cross-channel fusion module and multi-level feature fusion downsampling module are added to obtain more comprehensive context information. This makes the network pay more attention to the important features of small targets. Additionally, the classification task and regression task of the detection head are decoupled to speed up the model’s convergence and improve detection accuracy. Finally, a new loss function is proposed to further improve the accuracy and convergence rate of the detector. The algorithm is evaluated on the VisDrone2019 dataset and compared with the YOLOv5s algorithm. The results show an improvement of 4.7% in mAP0.5, 3.0% in mAP0.5:0.95, 3.6% in precision, and 6.4% in recall. At the same time, the algorithm was evaluated on the DIOR dataset, and mAP0.5:0.95 improved by 1.5%.These findings demonstrate the algorithm’s effectiveness in detecting small targets in aerial images captured by drones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

Visdrone2019 and DIOR datasets can be found in References 37, 38. The VisDrone2019 dataset was collected by a team from Tianjin University’s Machine Learning and Data Mining Lab AISKYEYE, and the entire baseline dataset was captured by drone. DIOR is a dataset for rotating target detection, jointly published by the Institute of Automation of the Chinese Academy of Sciences and Dahua Technology.

References

  1. Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014). https://doi.org/10.48550/arXiv.1311.2524

  2. Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision, pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169

  3. Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  4. Liu, W., Anguelov, D., Erhan, D., et al.: SSD: single shot MultiBox detector. In: Computer vision-ECCV, pp. 21–37 (2016). https://doi.org/10.1007/978-3-319-46448-0_2

  5. Redmo, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91

  6. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017). https://doi.org/10.1109/CVPR.2017.690

  7. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  8. Bochkovskiy, A., Wang, C.Y., et al.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

  9. Glenn, J.: YOLOv5 release v6.0. https://github.com/ultralytics/yolov5/releases/tag/v6.0. Accessed 26 June 2023 (2022)

  10. C, Li., L, Li., H, Jiang., et al.: YOLOv6: a single-stage object detection framework for industrial applications (2022). arXiv preprint arXiv:2209.02976

  11. Wang, C., Bochkovskiy, A., et al.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475 (2023). https://doi.org/10.1109/CVPR52729.2023.00721

  12. Lin, T., Maire, M., Belongie, S., et al.: Microsoft COCO: common objects in context. Comput. Vis. ECCV 2014, 740–755 (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Article  Google Scholar 

  13. Zhang, R., Shao, Z., Huang, X., et al.: Object detection in UAV images via global density fused convolutional network. Remote Sens. 12(19), 3140 (2020). https://doi.org/10.3390/rs12193140

    Article  Google Scholar 

  14. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint (2015). arXiv:1511.07122

  15. Liu, S., Zha, J., Sun, J. l.: EdgeYOLO: an edge-real-time object detector. In: 2023 42nd Chinese Control Conference, pp. 7507–7512 (2023). https://doi.org/10.23919/CCC58697.2023.10239786

  16. Zhou, L., Liu, Z., Zhao, H., et al.: A multi-scale object detector based on coordinate and global information aggregation for UAV aerial images. Remote Sens. 15(14), 3468 (2023). https://doi.org/10.3390/rs15143468

    Article  Google Scholar 

  17. Yu, W., Yang, T., Chen, C.: Towards resolving the challenge of long-tail distribution in UAV images for object detection. In: 2021 IEEE Winter Conference on Applications of Computer Vision, pp. 3257–3266 (2021). https://doi.org/10.1109/WACV48630.2021.00330

  18. Tan, M., Pang, R., Le, Q., et al.: EfficientDet: scalable and efficient object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10778–10787. https://doi.org/10.1109/CVPR42600.2020.01079

  19. Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. In: Proceedings of the European Conference on Computer Vision, pp. 385–400 (2018). https://doi.org/10.1007/978-3-030-01252-6_24

  20. Song, G., Liu, Y., Wang, X.: Revisiting the sibling head in object detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11560–11569 (2020). https://doi.org/10.1109/CVPR42600.2020.01158

  21. Ge, Z., Liu, S., Wang, F., et al.: YOLOX: Exceeding yolo series in 2021 (2021). arXiv preprint arXiv:2107.08430

  22. Zhu, X., Lyu, S., Wang, X., et al.: TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2778–2788 (2021). https://doi.org/10.1109/ICCVW54120.2021.00312

  23. Huang, R., Pedoeem, J., Chen, C., et al.: YOLO-LITE: a real-time object detection algorithm optimized for non-GPU computers. In: 2018 IEEE International Conference on Big Data, pp. 2503–2510 (2018). https://doi.org/10.1109/BigData.2018.8621865

  24. Lin, T., Dolláir, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944 (2017). https://doi.org/10.1109/CVPR.2017.106

  25. Liu, S., Qi, L., Qin, H. et al.: Path aggregation network for instance segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018). https://doi.org/10.1109/CVPR.2018.00913

  26. He, K., Zhang, X., Ren, S., Sun J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  27. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  28. Yu, J., Jiang, Y., Wang, Z., et al.: UnitBox: an advanced object detection network. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 516–520. (2016) https://doi.org/10.1145/2964284.2967274

  29. Zheng, Z., Wang, P., Liu, W., et al.: Distance-IoU loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12993–13000 (2020). https://doi.org/10.1609/aaai.v34i07.6999

  30. Rezatofighi, H., Tsoi, N., Gwak, J., et al.: Generalized Intersection over union: a metric and a loss for bounding box regression. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019). https://doi.org/10.1109/CVPR.2019.00075

  31. Zhang, H., Wang, Y., Dayoub, F.: VarifocalNet: an IoU-aware dense object detector. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8510–8519 (2021). https://doi.org/10.1109/CVPR46437.2021.00841

  32. Shao, Z., Lyu, H., Yin, Y., Cheng, T., et al.: Multi-scale object detection model for autonomous ship navigation in maritime environment. J. Mar. Sci. Eng. 10(11), 1783 (2022). https://doi.org/10.3390/jmse10111783

    Article  Google Scholar 

  33. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). https://doi.org/10.5555/3045118.3045167

  34. Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018). https://doi.org/10.1016/j.neunet.2017.12.012

    Article  Google Scholar 

  35. Srinivas, A., Lin, T., Parmar, N. et al.: Bottleneck transformers for visual recognition. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16514–16524 (2021). https://doi.org/10.1109/CVPR46437.2021.01625

  36. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745

  37. Xavier, G., Antoine, B., Yoshua, B.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, vol. 15, pp. 315–323 (2011)

  38. Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1800–1807 (2017). https://doi.org/10.1109/CVPR.2017.195

  39. Du, D., Zhu, P. et al.: (2019) VisDrone-DET2019: the vision meets drone object detection in image challenge results. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop, pp. 213–226. https://doi.org/10.1109/ICCVW.2019.00030

  40. Guo, H., Bai, H., Yuan, Y., et al.: Fully deformable convolutional network for ship detection in remote sensing imagery. Remote Sens. 14(8), 1850 (2022). https://doi.org/10.3390/rs14081850

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

MA: Data analysis and Writing. WH: Formal analysis. MW: Validation. CY: Methodology.

Corresponding author

Correspondence to Huajun Wang.

Ethics declarations

Conflict of interest

This study does not have conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mu, A., Wang, H., Meng, W. et al. Small target detection in drone aerial images based on feature fusion. SIViP 18 (Suppl 1), 585–598 (2024). https://doi.org/10.1007/s11760-024-03176-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-024-03176-3

Keywords

Navigation