[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Doublem-net: multi-scale spatial pyramid pooling-fast and multi-path adaptive feature pyramid network for UAV detection

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Unmanned aerial vehicles (UAVs) are extensively applied in military, rescue operations, and traffic detection fields, resulting from their flexibility, low cost, and autonomous flight capabilities. However, due to the drone’s flight height and shooting angle, the objects in aerial images are smaller, denser, and more complex than those in general images, triggering an unsatisfactory target detection effect. In this paper, we propose a model for UAV detection called DoubleM-Net, which contains multi-scale spatial pyramid pooling-fast (MS-SPPF) and Multi-Path Adaptive Feature Pyramid Network (MPA-FPN). DoubleM-Net utilizes the MS-SPPF module to extract feature maps of multiple receptive field sizes. Then, the MPA-FPN module first fuses features from every two adjacent scales, followed by a level-by-level interactive fusion of features. First, using the backbone network as the feature extractor, multiple feature maps of different scale ranges are extracted from the input image. Second, the MS-SPPF uses different pooled kernels to repeat multiple pooled operations at various scales to achieve rich multi-perceptive field features. Finally, the MPA-FPN module first incorporates semantic information between each adjacent two-scale layer. The top-level features are then passed back to the bottom level-by-level, and the underlying features are enhanced, enabling interaction and integration of features at different scales. The experimental results show that the mAP50-95 ratio of DoubleM-Net on the VisDrone dataset is 27.5%, and that of Doublem-Net on the DroneVehicle dataset in RGB and Infrared mode is 55.0% and 60.4%, respectively. Our model demonstrates excellent performance in air-to-ground image detection tasks, with exceptional results in detecting small objects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Algorithm 2
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability and access

All data, models, and code generated and utilized in this study are available upon reasonable request from the corresponding author. The codes will upload on https://github.com/yangwygithub/PaperCode.git, Branch: DoubleM-Net_Zhongxu-Li2023.

References

  1. Zou Z, Chen K, Shi Z, Guo Y, Ye J (2023) Object detection in 20 years: a survey. Proc IEEE 111(3):257–276

    Article  Google Scholar 

  2. Wang X, Zhao Y, Pourpanah F (2020) Recent advances in deep learning. Int J Mach Learn Cybern 11:747–750

    Article  Google Scholar 

  3. Cui J, Qin Y, Wu Y, Shao C, Yang H (2023) Skip connection yolo architecture for noise barrier defect detection using uav-based images in high-speed railway. IEEE Trans Intell Transp Syst 24(11):12180–12195

    Article  Google Scholar 

  4. Li X, Wu J (2023) Developing a more reliable framework for extracting traffic data from a uav video. IEEE Trans Intell Transp Syst 24(11):12272–12283

    Article  Google Scholar 

  5. Huang J, Jiang X, Jin G (2022) Detection of river floating debris in uav images based on improved yolov5. In: 2022 International Joint Conference on Neural Networks, pp 1–8

  6. Sun L, Zhang Y, Ouyang C, Yin S, Ren X, Fu S (2023) A portable uav-based laser-induced fluorescence lidar system for oil pollution and aquatic environment monitoring. Opt Commun 527:128914–128928

    Article  Google Scholar 

  7. Furusawa T, Premachandra C (2023) Innovative colormap for emphatic imaging of human voice for uav-based disaster victim search. In: 2023 IEEE Region 10 Symposium, pp. 1–5

  8. Dorn C, Depold A, Lurz F, Erhardt S, Hagelauer A (2022) Uav-based localization of mobile phones for search and rescue applications. In: 2022 IEEE 22nd Annual Wireless and Microwave Technology Conference, pp. 1–4

  9. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587

  10. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  11. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1–14

    Article  Google Scholar 

  12. Lin TY, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125

  13. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788

  14. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271

  15. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement arXiv:1804.02767

  16. Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection arXiv:2004.10934

  17. Jocher G (2020) YOLOv5 by Ultralytics

  18. Li C, Li L, Geng Y, Jiang H, Cheng M, Zhang B, Ke Z, Xu X, Chu X (2023) Yolov6 v3.0: a full-scale reloading arXiv:2301.05586

  19. Wang CY, Bochkovskiy A, Liao HYM (2022) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors arXiv:2207.02696

  20. Jocher G, Chaurasia A, Qiu J (2023) YOLO by Ultralytics

  21. Wang CY, Yeh IH, Liao HYM (2024) Yolov9: learning what you want to learn using programmable gradient information arXiv:2402.13616

  22. Xu X, Zhang X, Zhang T (2022) Lite-yolov5: a lightweight deep learning detector for on-board ship detection in large-scene sentinel-1 sar images. Remote Sens 14:1018–1030

    Article  Google Scholar 

  23. Xu X, Jiang Y, Chen W, Huang Y, Zhang Y, Sun X (2023) Damo-yolo: a report on real-time object detection design arXiv:2211.15444

  24. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  25. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2018) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  26. Liu S, Huang D, Wang a (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European Conference on Computer Vision, pp. 385–400

  27. Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. Proc AAAI Conf Artif Intell 31:11231–11245

    Google Scholar 

  28. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: Computer Vision—ECCV 2016: 14th European Conference, pp. 21–37

  29. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768

  30. Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790

  31. Zhang T, Zhang X, Ke X (2021) Quad-fpn: a novel quad feature pyramid network for sar ship detection. Remote Sens 13:2771–2785

    Article  Google Scholar 

  32. Jiang Y, Tan Z, Wang J, Sun X, Lin M, Li H (2022) Giraffedet: a heavy-neck paradigm for object detection arXiv:2202.04256

  33. Xu X, Zhang X, Shao Z, Shi J, Wei S, Zhang T, Zeng T (2022) A group-wise feature enhancement-and-fusion network with dual-polarization feature enrichment for sar ship detection. Remote Sens 14:5276–5291

    Article  Google Scholar 

  34. Yang G, Lei J, Zhu Z, Cheng S, Feng Z, Liang R (2023) Afpn: asymptotic feature pyramid network for object detection arXiv:2306.15988

  35. Saqib M, Khan SD, Sharma N, Blumenstein M (2017) A study on detecting drones using deep convolutional neural networks. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 1–5

  36. Chen C, Zhang Y, Lv Q, Wei S, Wang X, Sun X, Dong J (2019) Rrnet: a hybrid detector for object detection in drone-captured images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 100–108

  37. Khan SD, Alarabi L, Basalamah S (2022) A unified deep learning framework of multi-scale detectors for geo-spatial object detection in high-resolution satellite images. Arab J Sci Eng 47(8):9489–9504

    Article  Google Scholar 

  38. Zhang R, Shao Z, Huang X, Wang J, Li D (2020) Object detection in uav images via global density fused convolutional network. Remote Sens 12(19):3140–3143

    Article  Google Scholar 

  39. Tian G, Liu J, Yang W (2021) A dual neural network for object detection in uav images. Neurocomputing 443:292–301

    Article  Google Scholar 

  40. Chen J, Wang Q, Peng W, Xu H, Li X, Xu W (2022) Disparity-based multiscale fusion network for transportation detection. IEEE Trans Intell Transp Syst 23(10):18855–18863

    Article  Google Scholar 

  41. Li S, Chen J, Peng W, Shi X, Bu W (2023) A vehicle detection method based on disparity segmentation. Multimed Tools Appl 82(13):19643–19655

    Article  Google Scholar 

  42. Ma B, Liu Z, Dang Q, Zhao W, Wang J, Cheng Y, Yuan Z (2023) Deep reinforcement learning of uav tracking control under wind disturbances environments. IEEE Trans Instrum Meas 72(5):1–13

    Google Scholar 

  43. Zhang R, Shao Z, Huang X, Wang J, Wang Y, Li D (2022) Adaptive dense pyramid network for object detection in uav imagery. Neurocomputing 489:377–389

    Article  Google Scholar 

  44. Wang T, Ma Z, Yang T, Zou S (2023) Petnet: a yolo-based prior enhanced transformer network for aerial image detection. Neurocomputing 547:126384–126399

    Article  Google Scholar 

  45. Liu S, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection arXiv:1911.09516

  46. Zhu P, Wen L, Du D, Bian X, Fan H, Hu Q, Ling H (2022) Detection and tracking meet drones challenge. IEEE Trans Pattern Anal Mach Intell 44(11):7380–7399

    Article  Google Scholar 

  47. Sun Y, Cao B, Zhu P, Hu Q (2022) Drone-based rgb-infrared cross-modality vehicle detection via uncertainty-aware learning. IEEE Trans Circuits Syst Video Technol 32(10):6700–6713

    Article  Google Scholar 

  48. Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 840–849

  49. Zhang S, Chi C, Yao Y, Lei Z, Li SZ (2019) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection arXiv:1912.02424

  50. Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6054–6063

  51. Zhou X, Wang D, Krähenbühl P (2019) Objects as points

  52. Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection arXiv:1904.01355

  53. Chen Z, Yang C, Li Q, Zhao F, Zha ZJ, Wu F (2021) Disentangle your dense object detector. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4939–4948

  54. Feng C, Zhong Y, Gao Y, Scott MR, Huang W (2021) Tood: task-aligned one-stage object detection. In: 2021 IEEE/CVF International Conference on Computer Vision, pp. 3490–3499

  55. Zhang H, Wang Y, Dayoub F, Sünderhauf N (2020) Varifocalnet: an iou-aware dense object detector arXiv:1200.81336

  56. Cai Z, Vasconcelos N (2019) Cascade r-cnn: high quality object detection and instance segmentation. IEEE Trans Pattern Anal Mach Intell 43:1–15

    Google Scholar 

  57. Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: exceeding yolo series in 2021 arXiv:2107.08430

Download references

Acknowledgements

The research is supported by the National Natural Science Foundation of China under Grant No. 62376114, the National Natural Science Foundation of China under Grant No.12101289, the Natural Science Foundation of Fujian Province under Grant Nos.2020J01821 and 2022J01891. And it is supported by the Institute of Meteorological Big Data-Digital Fujian, and Fujian Key Laboratory of Data Science and Statistics (Minnan Normal University), China.

Author information

Authors and Affiliations

Authors

Contributions

Zhongxu Li, Qihan He, Hong Zhao and Wenyuan Yang contribute equally to this work.

Corresponding author

Correspondence to Wenyuan Yang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical and informed consent for data used

This article does not contain any research conducted by any author on human participants or animals and informed consent is obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Z., He, Q., Zhao, H. et al. Doublem-net: multi-scale spatial pyramid pooling-fast and multi-path adaptive feature pyramid network for UAV detection. Int. J. Mach. Learn. & Cyber. 15, 5781–5805 (2024). https://doi.org/10.1007/s13042-024-02278-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-024-02278-1

Keywords

Navigation