Abstract
The proportion of small objects in remote sensing images is relatively small, which is prone to feature loss or interference from the surrounding complex background in the detection process. To solve this problem, a remote sensing small object detection network (CD-YOLOX) based on enhanceds’ feature pyramid network (CS-PANet) and global local combine module (DSAN) based on YOLOX is proposed. Firstly, to solve the problem of small object feature loss and surrounding complex background interference caused by multiple convolution and feature stacking operations in PANet, CS-PANet is proposed. This method improves the network’s focus on effective feature channels for small objects by using channel attention at the input of PANet, meanwhile, the input that passes the channel attention is connected to the output of PANet across layers, which makes the PANet retain the richer original information of small objects. Secondly, to further reduce the interference of the surrounding complex background on the small objects, a DSAN module consisting of self-attention mechanism, dilated convolution and residual connection is proposed before network prediction. This module combines self-attention mechanism with dilated convolution so that the network focuses on the global feature region of the small object in the feature map while effectively complementing the local context of this region, and preserving the original information through residual connection. Finally, the effectiveness of the method is verified using the remote sensing dataset NWPU VHR-10 and the general datasets KITTI and PASCAL VOC. The experiment shows that the method improves the detection accuracy by 5.56% in the NWPU VHR-10 dataset and by 1.93% and 2.51% in the KITTI and PASCAL VOC datasets respectively compared to the original network, which fully verifies the effectiveness of the method for detecting the small objects of remote sensing and the ability of the method to detect general purpose objects.
Similar content being viewed by others
Availability of data and materials
The data used to support the findings of this study are available from the corresponding author upon request.
Code availability
Not applicable.
References
Xiao Y, Tian Z, Yu J, Zhang Y, Liu S, Du S, Lan X (2020) A review of object detection based on deep learning. Multimed Tools Appl 79:23729–23791
Zhang C, Zhang X, Jiang M (2021) Research on parallel detection technology of remote sensing object based on deep learning. In: 2021 4th international conference on intelligent autonomous systems (ICoIAS). IEEE, pp 29–32
Ye X, Xiong F, Lu J, Zhou J, Qian Y (2020) F3-net: Feature fusion and filtration network for object detection in optical remote sensing images. Remote Sens 12(24):4027
Huang Z, Li W, Xia X-G, Wu X, Cai Z, Tao R (2021) A novel nonlocal-aware pyramid and multiscale multitask refinement detector for object detection in remote sensing images. IEEE Trans Geosci Remote Sens 60:1–20
Liu R, Yu Z, Mo D, Cai Y (2020) An improved faster-RCNN algorithm for object detection in remote sensing images. In: 2020 39th Chinese control conference (CCC). IEEE, pp 7188–7192
Rabbi J, Ray N, Schubert M, Chowdhury S, Chao D (2020) Small-object detection in remote sensing images with end-to-end edge-enhanced GAN and object detector network. Remote Sensing 12(9):1432
Wang N, Li B, Wei X, Wang Y, Yan H (2020) Ship detection in spaceborne infrared image based on lightweight CNN and multisource feature cascade decision. IEEE Trans Geosci Remote Sens 59(5):4324–4339
Sakai K, Seo T, Fuse T (2019) Traffic density estimation method from small satellite imagery: towards frequent remote sensing of car traffic. In: 2019 IEEE intelligent transportation systems conference (ITSC). IEEE, pp 1776–1781
Wang H, Cao H, Kai Y, Bai H, Chen X, Yang Y, Xing L, Zhou C (2022) Multi-source remote sensing intelligent characterization technique-based disaster regions detection in high-altitude mountain forest areas. IEEE Geosci Remote Sens Lett 19:1–5
Girshick, R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision. pp 1440–1448
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision. pp 2961–2969
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: Computer vision–ECCV 2016: 14th European conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer, pp 21–37
Bochkovskiy A, Wang CY, Liao H (2020) YOLOv4: optimal speed and accuracy of object detection
Ge Z, Liu S, Wang F, Li Z, Sun J (2021) YOLOX: exceeding yolo series in 2021. arXiv:2107.08430
Wang C-Y, Bochkovskiy A, Liao H-YM (2023) YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 7464–7475
Kisantal M, Wojna Z, Murawski J, Naruniec J, Cho K (2019) Augmentation for small object detection. arXiv:1902.07296
Chen Y, Zhang P, Li Z, Li Y, Zhang X, Meng G, Xiang S, Sun J, Jia J (2020) Stitcher: feedback-driven data provider for object detection. 2(7):12 arXiv:2004.12432
Noh J, Bae W, Lee W, Seo J, Kim G (2019) Better to follow, follow to be better: towards precise supervision of feature super-resolution for small object detection. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 9725–9734
Bai Y, Zhang Y, Ding M, Ghanem B (2018) SOD-MTGAN: small object detection via multi-task generative adversarial network. In: Proceedings of the European conference on computer vision (ECCV). pp 206–221
Li J, Liang X, Wei Y, Xu T, Feng J, Yan S (2017) Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1222–1230
Qu J, Bi X, Liu S (2021) Research on recognition algorithm of LSS based on video in airport clearance area. In: 2021 IEEE 2nd international conference on big data, artificial intelligence and internet of things engineering (ICBAIE). IEEE, pp 110–113
Liu S, Qi L, Qin H, Shi J, Jia, J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 8759–8768
Zhang J, Zhang H, Liu B, Qu G, Wang F, Zhang H, Shi X (2023) Small object intelligent detection method based on adaptive recursive feature pyramid. Heliyon 9(7)
Jocher G, Stoken A, Borovec J, Chaurasia A, Changyu L, Hogan A, Hajek J, Diaconu L, Kwon Y, Defretin Y et al (2021) ultralytics/yolov5: v5. 0-yolov5-p6 1280 models, aws, supervise. ly and youtube integrations. Zenodo
Wang C-Y, Liao H-YM, Wu Y-H, Chen P-Y, Hsieh J-W, Yeh I-H (2020) CSPNET: a new backbone that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) workshops
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Computer vision–ECCV 2014: 13th European conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part I 13. Springer, pp 818–833
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2117–2125
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) DeepLab: semantic image segmentation with deep convolutional nets, Atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Chen L-C, Papandreou G, Schroff F, Adam H (2017) Rethinking Atrous convolution for semantic image segmentation. arXiv:1706.05587
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2881–2890
Liu S, Huang D, et al. (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision (ECCV). pp 385–400
Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 6054–6063
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Cortes C, Lawarence N, Lee D, Sugiyama M, Garnett R (2015) Advances in neural information processing systems 28. In: Proceedings of the 29th annual conference on neural information processing systems
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7794–7803
Hu H, Gu J, Zhang Z, Dai J, Wei Y (2018) Relation networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3588–3597
Zhang H, Goodfellow I, Metaxas D, Odena A (2019) Self-attention generative adversarial networks. In: International conference on machine learning. PMLR, pp 7354–7363
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 3146–3154
Sanghyun W, Jongchan P, Joon-Young L et al (2018) CBAM: convolutional block attention module; proceedings of the Proceedings of the European conference on computer vision (ECCV). F
Guo Q, Liu J, Kaliuzhnyi M (2022) YOLOX-SAR: high-precision object detection system based on visible and infrared sensors for SAR remote sensing. IEEE Sens J 22(17):17243–17253
Chen J, Hong H, Song B, Guo J, Chen C, Xu J (2023) MDCT: multi-kernel dilated convolution and transformer for one-stage object detection of remote sensing images. Remote Sens 15(2):371
Zhao T, Liu N, Celik T, Li H-C (2021) An arbitrary-oriented object detector based on variant Gaussian label in remote sensing images. IEEE Geosci Remote Sens Lett 19:1–5
Guo Y, Tong X, Xu X, Liu S, Feng Y, Xie H (2022) An anchor-free network with density map and attention mechanism for multiscale object detection in aerial images. IEEE Geosci Remote Sens Lett 19:1–5
Mehtab S, Yan WQ (2022) Flexible neural network for fast and accurate road scene perception. Multimed Tools Appl 81(5):7169–7181
Yya B, Hl A, Wei FB (2020) Faster-YOLO: an accurate and faster object detection method. Dig Signal Process 102
Ma W, Wu Y, Cen F, Wang G (2020) MDFN: multi-scale deep feature learning network for object detection. Pattern Recogn 100:107149
Hwang Y-J, Lee J-G, Moon U-C, Park H-H (2020) SSD-TSEFFM: new SSD using trident feature and squeeze and extraction feature fusion. Sensors 20(13):3630
Dai Y, Liu W, Wang H, Xie W, Long K (2022) YOLO-Former: marrying YOLO and transformer for foreign object detection. IEEE Trans Instrum Meas 71:1–14
Yan L, Li K, Gao R, Wang C, Xiong N (2022) An intelligent weighted object detector for feature extraction to enrich global image information. Appl Sci 12(15):7825
Funding
This work is supported by the grants from National Science Foundation of China (No.62102373, 62006213), the science and technology research project of Henan province(No.212102310053,222102210118).
Author information
Authors and Affiliations
Contributions
Jie zhang: Conceptualizatio; Methodology; Analysis; Resources; Writing review and editing; Investigation; Supervision. Bowen Liu: Data curation; Investigation; Software; Validation; Writing original draft and editing. Hongyan Zhang: Conceptualizatio; Methodology; Validation Analysis; Review and editing; Visualization. Lei Zhang: Conceptualizatio; Methodology; Analysis; Resources; Investigation; Supervision. Fengxian Wang: Data curation; Investigation; Software; Resources; Data curation. Yibin Chen: Investigation; Software; Resources; Data curation.
Corresponding author
Ethics declarations
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Conflicts of interest
No potential conflict of interest was reported by the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, J., Liu, B., Zhang, H. et al. A small object detection network for remote sensing based on CS-PANet and DSAN. Multimed Tools Appl 83, 72079–72096 (2024). https://doi.org/10.1007/s11042-024-18397-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-024-18397-4