[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

MFRENet: efficient detection of drone image based on multiscale feature aggregation and receptive field expanded

  • Original Article
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

The field of object detection in images captured by drones is witnessing a growing surge in research interest. However, because of the abundance of densely packed small objects in the majority of drone images, efficiently detecting dense small objects and achieving accurate classification remain a formidable challenge. To solve the problems mentioned above, we introduce an effective object detection network for drone images based on Multiscale Feature aggregation and Receptive field Expansion (MFRENet). First, we design an effective module named Receptive Field Expanded Feature Extraction Module (RFEFE), which can improve the model's perception ability of objects with irregular shapes and varying sizes. Next, we introduce the Multiscale Cross Stage Parallel Feature Fusion Module (MCSPFF), which integrates the RFEFE module, and then add the Shuffle Attention module to enable MCSPFF to obtain more semantic information. Then, we propose the Extended Simplified Spatial Pyramid Pooling-Fast and Feature Enhancement Module (ESimSPP2FE), which is inspired by the attention mechanism and enhances the features of small objects. Finally, we propose a small target detection head specially used to detect small targets, which enhances the detection ability of our model. Comprehensive experiments are performed on the VisDrone2021-DET dataset, and the proposed model is compared with the baseline YOLOv8m. The experimental results demonstrate that, in comparison to YOLOv8m, the proposed model achieves improvements of 1.9 and 2.7% in mAP and AP50, respectively. The code is available at https://github.com/chenhao-123-sudo/MFRENet-achive.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Availability of data and materials

The VisDrone dataset that support the findings of this study are available from the website, [https://gitcode.com/visdrone/visdrone-dataset/overview].

References

  1. Li C, Li L, Jiang H, et al (2022) YOLOv6: A single-stage object detection framework for industrial applications

  2. Sensors | Free Full-Text | Deep learning for object detection, classification and tracking in industry applications. https://www.mdpi.com/1424-8220/21/21/7349. Accessed 7 Aug 2023

  3. Zhang H, Sun M, Li Q et al (2021) An empirical study of multi-scale object detection in high resolution UAV images. Neurocomputing 421:173–182. https://doi.org/10.1016/j.neucom.2020.08.074

    Article  Google Scholar 

  4. Yu D, Ji S (2022) A new spatial-oriented object detection framework for remote sensing images. IEEE Trans Geosci Remote Sens 60:1–16. https://doi.org/10.1109/TGRS.2021.3127232

    Article  Google Scholar 

  5. Sun Y, Shao Z, Cheng G et al (2022) Road and car extraction using uav images via efficient dual contextual parsing network. IEEE Trans Geosci Remote Sens 60:1–13. https://doi.org/10.1109/TGRS.2022.3214246

    Article  Google Scholar 

  6. Bo W, Liu J, Fan X et al (2022) BASNet: burned area segmentation network for real-time detection of damage maps in remote sensing images. IEEE Trans Geosci Remote Sens 60:1–13. https://doi.org/10.1109/TGRS.2022.3197647

    Article  Google Scholar 

  7. Sun C, Ai Y, Qi X et al (2022) A single-shot model for traffic-related pedestrian detection. Pattern Anal Applic 25:853–865. https://doi.org/10.1007/s10044-022-01076-1

    Article  Google Scholar 

  8. Liu W, Anguelov D, Erhan D et al (2016) SSD: Single Shot MultiBox Detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016. Springer International Publishing, Cham, pp 21–37

    Chapter  Google Scholar 

  9. Prabu M, Chelliah BJ (2023) An intelligent approach using boosted support vector machine based arithmetic optimization algorithm for accurate detection of plant leaf disease. Pattern Anal Appl 26:367–379. https://doi.org/10.1007/s10044-022-01086-z

    Article  Google Scholar 

  10. Everingham M, Van Gool L, Williams CKI et al (2010) The Pascal Visual Object Classes (VOC) Challenge. Int J Comput Vis 88:303–338. https://doi.org/10.1007/s11263-009-0275-4

    Article  Google Scholar 

  11. Lin T-Y, Maire M, Belongie S, et al (2015) Microsoft COCO: common objects in context

  12. Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: Optimal speed and accuracy of object detection

  13. Ge Z, Liu S, Wang F, et al (2021) YOLOX: Exceeding YOLO series in 2021

  14. ultralytics/yolov5: v5.0 - YOLOv5-P6 1280 models, AWS, Supervise.ly and YouTube integrations | Semantic Scholar. https://www.semanticscholar.org/paper/ultralytics-yolov5%3A-v5.0-YOLOv5-P6-1280-models%2C-and-Jocher-Stoken/fd550b29c0efee17be5eb1447fddc3c8ce66e838. Accessed 7 Aug 2023

  15. Wang C-Y, Bochkovskiy A, Liao H-YM (2021) Scaled-YOLOv4: Scaling Cross Stage Partial Network

  16. Zhu X, Su W, Lu L, et al (2021) Deformable DETR: deformable transformers for end-to-end object detection

  17. Tian Z, Shen C, Chen H, He T (2019) FCOS: fully convolutional one-stage object detection

  18. Tan M, Pang R, Le QV (2020) EfficientDet: scalable and efficient object detection

  19. He K, Gkioxari G, Dollár P, Girshick R (2018) Mask R-CNN

  20. Cai Z, Vasconcelos N (2017) Cascade R-CNN: delving into high quality object detection

  21. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: arXiv.org. https://arxiv.org/abs/1506.01497v3. Accessed 5 Jun 2023

  22. Liu Z, Lin Y, Cao Y, et al (2021) Swin transformer: hierarchical vision transformer using shifted windows

  23. Liu Z, Hu H, Lin Y, et al (2022) Swin transformer V2: Scaling up capacity and resolution

  24. Jocher G, Chaurasia A, Qiu J (2023) YOLO by ultralytics

  25. Li C, Li L, Geng Y, et al (2023) YOLOv6 v3.0: A full-scale reloading

  26. Wang C-Y, Bochkovskiy A, Liao H-YM (2022) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

  27. Zhou X, Koltun V, Krähenbühl P (2021) Probabilistic two-stage detection

  28. Sun W, Yan D, Huang J, Sun C (2020) Small-scale moving target detection in aerial image by deep inverse reinforcement learning. Soft Comput 24:5897–5908. https://doi.org/10.1007/s00500-019-04404-6

    Article  Google Scholar 

  29. Wang J, Yang W, Guo H, et al (2021) Tiny object detection in aerial images. In: 2020 25th International conference on pattern recognition (ICPR). pp 3791–3798

  30. Yang C, Huang Z, Wang N (2022) QueryDet: cascaded sparse query for accelerating high-resolution small object detection

  31. Lin T-Y, Dollár P, Girshick R, et al (2017) Feature pyramid networks for object detection

  32. Peng F, Miao Z, Li F, Li Z (2021) S-FPN: A shortcut feature pyramid network for sea cucumber detection in underwater images. Expert Syst Appl 182:115306. https://doi.org/10.1016/j.eswa.2021.115306

    Article  Google Scholar 

  33. Qiao S, Chen L-C, Yuille A (2021) DetectoRS: detecting objects with recursive feature pyramid and switchable atrous convolution. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR). pp 10208–10219

  34. Liu Z, Cheng J (2023) CB-FPN: object detection feature pyramid network based on context information and bidirectional efficient fusion. Pattern Anal Appl 26:1441–1452. https://doi.org/10.1007/s10044-023-01173-9

    Article  Google Scholar 

  35. Yang Q-LZY-B (2021) SA-Net: shuffle attention for deep convolutional neural networks

  36. Yu W, Yang T, Chen C (2020) Towards resolving the challenge of long-tail distribution in UAV images for object detection. arXiv e-prints

  37. Liu Z, Gao G, Sun L, Fang Z (2021) HRDNet: high-resolution detection network for small objects. In: 2021 IEEE international conference on multimedia and expo (ICME). pp 1–6

  38. Chalavadi V, Jeripothula P, Datla R et al (2022) mSODANet: A network for multi-scale object detection in aerial images using hierarchical dilated convolutions. Pattern Recogn 126:108548. https://doi.org/10.1016/j.patcog.2022.108548

    Article  Google Scholar 

  39. Wang X, He N, Hong C et al (2023) Improved YOLOX-X based UAV aerial photography object detection algorithm. Image Vis Comput 135:104697. https://doi.org/10.1016/j.imavis.2023.104697

    Article  Google Scholar 

  40. Zhu X, Hu H, Lin S, Dai J (2018) Deformable ConvNets v2: more deformable, better results

  41. He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. pp 346–361

  42. Wang C-Y, Liao H-YM, Yeh I-H, et al (2019) CSPNet: a new backbone that can enhance learning capability of CNN

  43. Du D, Wen L, Zhu P et al (2020) VisDrone-det2020: the vision meets drone object detection in image challenge results. In: Bartoli A, Fusiello A (eds) Computer vision—ECCV 2020 workshops. Springer International Publishing, Cham, pp 692–712

    Chapter  Google Scholar 

  44. Zhu X, Lyu S, Wang X, Zhao Q (2021) TPH-YOLOv5: improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios

  45. Li Z, Peng C, Yu G, et al (2017) Light-Head R-CNN: in defense of two-stage object detector

  46. Law H, Deng J (2019) CornerNet: detecting objects as paired keypoints

  47. VisDrone 2020 Leaderboard—VISDRONE. http://aiskyeye.com/%20visdrone-2020-leaderboard/. Accessed 16 Aug 2023

  48. Zhao Q, Liu B, Lyu S et al (2023) TPH-YOLOv5++: boosting object detection on drone-captured scenarios with cross-layer asymmetric transformer. Remote Sensing 15:1687. https://doi.org/10.3390/rs15061687

    Article  Google Scholar 

  49. Wang C-Y, Yeh I-H, Liao H-YM (2024) YOLOv9: learning what you want to learn using programmable gradient information

  50. Wang A, Chen H, Liu L, et al (2024) YOLOv10: real-time end-to-end object detection

Download references

Funding

The authors thank the Natural Science Foundation of Hebei Province (F2024201012) and the Post-graduate’s Innovation Fund Project of Hebei University (HBU2024SS032) for their financial support and the support of the High-Performance Computing Center of Hebei University.

Author information

Authors and Affiliations

Authors

Contributions

CH: Conceptualization, Methodology, Software, Visualization, Writing—original draft. WY: Supervision, Investigation, Writing—review and editing. GZ: Reviewing, Language Correction. GZ: Language Correction. ZN: Language Correction.

Corresponding author

Correspondence to Wenzhu Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, H., Yang, W., Zhou, G. et al. MFRENet: efficient detection of drone image based on multiscale feature aggregation and receptive field expanded. Pattern Anal Applic 27, 120 (2024). https://doi.org/10.1007/s10044-024-01337-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10044-024-01337-1

Keywords

Navigation