Abstract
The field of object detection in images captured by drones is witnessing a growing surge in research interest. However, because of the abundance of densely packed small objects in the majority of drone images, efficiently detecting dense small objects and achieving accurate classification remain a formidable challenge. To solve the problems mentioned above, we introduce an effective object detection network for drone images based on Multiscale Feature aggregation and Receptive field Expansion (MFRENet). First, we design an effective module named Receptive Field Expanded Feature Extraction Module (RFEFE), which can improve the model's perception ability of objects with irregular shapes and varying sizes. Next, we introduce the Multiscale Cross Stage Parallel Feature Fusion Module (MCSPFF), which integrates the RFEFE module, and then add the Shuffle Attention module to enable MCSPFF to obtain more semantic information. Then, we propose the Extended Simplified Spatial Pyramid Pooling-Fast and Feature Enhancement Module (ESimSPP2FE), which is inspired by the attention mechanism and enhances the features of small objects. Finally, we propose a small target detection head specially used to detect small targets, which enhances the detection ability of our model. Comprehensive experiments are performed on the VisDrone2021-DET dataset, and the proposed model is compared with the baseline YOLOv8m. The experimental results demonstrate that, in comparison to YOLOv8m, the proposed model achieves improvements of 1.9 and 2.7% in mAP and AP50, respectively. The code is available at https://github.com/chenhao-123-sudo/MFRENet-achive.
Similar content being viewed by others
Availability of data and materials
The VisDrone dataset that support the findings of this study are available from the website, [https://gitcode.com/visdrone/visdrone-dataset/overview].
References
Li C, Li L, Jiang H, et al (2022) YOLOv6: A single-stage object detection framework for industrial applications
Sensors | Free Full-Text | Deep learning for object detection, classification and tracking in industry applications. https://www.mdpi.com/1424-8220/21/21/7349. Accessed 7 Aug 2023
Zhang H, Sun M, Li Q et al (2021) An empirical study of multi-scale object detection in high resolution UAV images. Neurocomputing 421:173–182. https://doi.org/10.1016/j.neucom.2020.08.074
Yu D, Ji S (2022) A new spatial-oriented object detection framework for remote sensing images. IEEE Trans Geosci Remote Sens 60:1–16. https://doi.org/10.1109/TGRS.2021.3127232
Sun Y, Shao Z, Cheng G et al (2022) Road and car extraction using uav images via efficient dual contextual parsing network. IEEE Trans Geosci Remote Sens 60:1–13. https://doi.org/10.1109/TGRS.2022.3214246
Bo W, Liu J, Fan X et al (2022) BASNet: burned area segmentation network for real-time detection of damage maps in remote sensing images. IEEE Trans Geosci Remote Sens 60:1–13. https://doi.org/10.1109/TGRS.2022.3197647
Sun C, Ai Y, Qi X et al (2022) A single-shot model for traffic-related pedestrian detection. Pattern Anal Applic 25:853–865. https://doi.org/10.1007/s10044-022-01076-1
Liu W, Anguelov D, Erhan D et al (2016) SSD: Single Shot MultiBox Detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016. Springer International Publishing, Cham, pp 21–37
Prabu M, Chelliah BJ (2023) An intelligent approach using boosted support vector machine based arithmetic optimization algorithm for accurate detection of plant leaf disease. Pattern Anal Appl 26:367–379. https://doi.org/10.1007/s10044-022-01086-z
Everingham M, Van Gool L, Williams CKI et al (2010) The Pascal Visual Object Classes (VOC) Challenge. Int J Comput Vis 88:303–338. https://doi.org/10.1007/s11263-009-0275-4
Lin T-Y, Maire M, Belongie S, et al (2015) Microsoft COCO: common objects in context
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: Optimal speed and accuracy of object detection
Ge Z, Liu S, Wang F, et al (2021) YOLOX: Exceeding YOLO series in 2021
ultralytics/yolov5: v5.0 - YOLOv5-P6 1280 models, AWS, Supervise.ly and YouTube integrations | Semantic Scholar. https://www.semanticscholar.org/paper/ultralytics-yolov5%3A-v5.0-YOLOv5-P6-1280-models%2C-and-Jocher-Stoken/fd550b29c0efee17be5eb1447fddc3c8ce66e838. Accessed 7 Aug 2023
Wang C-Y, Bochkovskiy A, Liao H-YM (2021) Scaled-YOLOv4: Scaling Cross Stage Partial Network
Zhu X, Su W, Lu L, et al (2021) Deformable DETR: deformable transformers for end-to-end object detection
Tian Z, Shen C, Chen H, He T (2019) FCOS: fully convolutional one-stage object detection
Tan M, Pang R, Le QV (2020) EfficientDet: scalable and efficient object detection
He K, Gkioxari G, Dollár P, Girshick R (2018) Mask R-CNN
Cai Z, Vasconcelos N (2017) Cascade R-CNN: delving into high quality object detection
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: arXiv.org. https://arxiv.org/abs/1506.01497v3. Accessed 5 Jun 2023
Liu Z, Lin Y, Cao Y, et al (2021) Swin transformer: hierarchical vision transformer using shifted windows
Liu Z, Hu H, Lin Y, et al (2022) Swin transformer V2: Scaling up capacity and resolution
Jocher G, Chaurasia A, Qiu J (2023) YOLO by ultralytics
Li C, Li L, Geng Y, et al (2023) YOLOv6 v3.0: A full-scale reloading
Wang C-Y, Bochkovskiy A, Liao H-YM (2022) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
Zhou X, Koltun V, Krähenbühl P (2021) Probabilistic two-stage detection
Sun W, Yan D, Huang J, Sun C (2020) Small-scale moving target detection in aerial image by deep inverse reinforcement learning. Soft Comput 24:5897–5908. https://doi.org/10.1007/s00500-019-04404-6
Wang J, Yang W, Guo H, et al (2021) Tiny object detection in aerial images. In: 2020 25th International conference on pattern recognition (ICPR). pp 3791–3798
Yang C, Huang Z, Wang N (2022) QueryDet: cascaded sparse query for accelerating high-resolution small object detection
Lin T-Y, Dollár P, Girshick R, et al (2017) Feature pyramid networks for object detection
Peng F, Miao Z, Li F, Li Z (2021) S-FPN: A shortcut feature pyramid network for sea cucumber detection in underwater images. Expert Syst Appl 182:115306. https://doi.org/10.1016/j.eswa.2021.115306
Qiao S, Chen L-C, Yuille A (2021) DetectoRS: detecting objects with recursive feature pyramid and switchable atrous convolution. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR). pp 10208–10219
Liu Z, Cheng J (2023) CB-FPN: object detection feature pyramid network based on context information and bidirectional efficient fusion. Pattern Anal Appl 26:1441–1452. https://doi.org/10.1007/s10044-023-01173-9
Yang Q-LZY-B (2021) SA-Net: shuffle attention for deep convolutional neural networks
Yu W, Yang T, Chen C (2020) Towards resolving the challenge of long-tail distribution in UAV images for object detection. arXiv e-prints
Liu Z, Gao G, Sun L, Fang Z (2021) HRDNet: high-resolution detection network for small objects. In: 2021 IEEE international conference on multimedia and expo (ICME). pp 1–6
Chalavadi V, Jeripothula P, Datla R et al (2022) mSODANet: A network for multi-scale object detection in aerial images using hierarchical dilated convolutions. Pattern Recogn 126:108548. https://doi.org/10.1016/j.patcog.2022.108548
Wang X, He N, Hong C et al (2023) Improved YOLOX-X based UAV aerial photography object detection algorithm. Image Vis Comput 135:104697. https://doi.org/10.1016/j.imavis.2023.104697
Zhu X, Hu H, Lin S, Dai J (2018) Deformable ConvNets v2: more deformable, better results
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. pp 346–361
Wang C-Y, Liao H-YM, Yeh I-H, et al (2019) CSPNet: a new backbone that can enhance learning capability of CNN
Du D, Wen L, Zhu P et al (2020) VisDrone-det2020: the vision meets drone object detection in image challenge results. In: Bartoli A, Fusiello A (eds) Computer vision—ECCV 2020 workshops. Springer International Publishing, Cham, pp 692–712
Zhu X, Lyu S, Wang X, Zhao Q (2021) TPH-YOLOv5: improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios
Li Z, Peng C, Yu G, et al (2017) Light-Head R-CNN: in defense of two-stage object detector
Law H, Deng J (2019) CornerNet: detecting objects as paired keypoints
VisDrone 2020 Leaderboard—VISDRONE. http://aiskyeye.com/%20visdrone-2020-leaderboard/. Accessed 16 Aug 2023
Zhao Q, Liu B, Lyu S et al (2023) TPH-YOLOv5++: boosting object detection on drone-captured scenarios with cross-layer asymmetric transformer. Remote Sensing 15:1687. https://doi.org/10.3390/rs15061687
Wang C-Y, Yeh I-H, Liao H-YM (2024) YOLOv9: learning what you want to learn using programmable gradient information
Wang A, Chen H, Liu L, et al (2024) YOLOv10: real-time end-to-end object detection
Funding
The authors thank the Natural Science Foundation of Hebei Province (F2024201012) and the Post-graduate’s Innovation Fund Project of Hebei University (HBU2024SS032) for their financial support and the support of the High-Performance Computing Center of Hebei University.
Author information
Authors and Affiliations
Contributions
CH: Conceptualization, Methodology, Software, Visualization, Writing—original draft. WY: Supervision, Investigation, Writing—review and editing. GZ: Reviewing, Language Correction. GZ: Language Correction. ZN: Language Correction.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, H., Yang, W., Zhou, G. et al. MFRENet: efficient detection of drone image based on multiscale feature aggregation and receptive field expanded. Pattern Anal Applic 27, 120 (2024). https://doi.org/10.1007/s10044-024-01337-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10044-024-01337-1