A Novel Effectively Optimized One-Stage Network for Object Detection in Remote Sensing Imagery
"> Figure 1
<p>Examples difficult to detect. Objects in remote sensing images are not only very small and densely clustered but also arranged with diverse orientations and extremely complex background.</p> "> Figure 2
<p>Detection process. The detection process can be divided into six parts. Specifically, the dataset construction will be described in detail in <a href="#sec3dot1dot1-remotesensing-11-01376" class="html-sec">Section 3.1.1</a>, the split and merge strategy in <a href="#sec3dot3dot1-remotesensing-11-01376" class="html-sec">Section 3.3.1</a>, the multi-scale training in <a href="#sec3dot3dot2-remotesensing-11-01376" class="html-sec">Section 3.3.2</a>, the network architecture in <a href="#sec2dot2-remotesensing-11-01376" class="html-sec">Section 2.2</a>, and the regression and classification in <a href="#sec2dot3-remotesensing-11-01376" class="html-sec">Section 2.3</a>.</p> "> Figure 3
<p>Heatmaps. (<b>a</b>) is an input image; (<b>b</b>–<b>d</b>) demonstrate the heatmaps collected from the forepart of the network; (<b>e</b>–<b>i</b>) show the heatmaps collected from latter layers of the network.</p> "> Figure 4
<p>Feature extraction structure. The feature extraction structure of novel effectively optimized one-stage network (NEOON) constructs top-down and bottom-up inference through a series of dwon-sampling operations and corresponding up-sampling operations.</p> "> Figure 5
<p>Feature fusion structure. Concatenation operation and corresponding convolutional operations are employed four times in NEOON to achieve feature fusion. Note that residual modules are adopted to accelerate and optimize the fitting process of the model. Subsequently, four detectors in increasing layers access the progressively fused feature maps, all four parallel.</p> "> Figure 6
<p>Feature enhancement structure. (<b>a</b>) The mechanism of Receptive-Filed Enhancement (RFE) module from receptive field; (<b>b</b>) the detailed architecture of RFE module.</p> "> Figure 7
<p>Network structure detail. As a fully convolutional network, NEOON consists of more than fifty convolutions, 21 residual modules and an RFE module in practice.</p> "> Figure 8
<p>Effect of the Soft-non-maximum suppression (NMS). In (<b>a</b>), a car cannot be detected because its confidence is set to 0 by NMS due to its Intersection-over-Union (IOU) with the nearest car, having a higher confidence 0.92, more than a thresh; however, in (<b>b</b>), the confidence of the car not detected in (<b>a</b>) is set to 0.70 instead of 0 according to Equation (<a href="#FD13-remotesensing-11-01376" class="html-disp-formula">13</a>).</p> "> Figure 9
<p>Split and merge strategy. We tend to split the images into several square chips and each chip is detected by the network separately to produce a single result. Finally, all the results are merged into a large image with the same size as the original image.</p> "> Figure 10
<p>Some detection results with three categories ((<b>a</b>) airplane; (<b>b</b>) car; (<b>c</b>) ship) to validate the performance of NEOON as well as the split and merge strategy. The first row shows the original images to be detected. The second row demonstrates the results obtained by YOLOv3. The third row shows the results obtained by YOLOv3 with the split and merge strategy. The last row is the detection results by NEOON with the split and merge strategy. In (<b>a</b>), YOLOv3 gets similar results as YOLOv3+split; however, YOLOv3+split achieves a higher classification confidence score than YOLOv3. Note that the first two methods cannot detect the tiny objects such as the two airplanes at the top of the testing image. However, the proposed NEOON can make it and detect more objects; in (<b>b</b>), cars are harder to detect than airplanes because they occupy fewer pixels and YOLOv3 can detect only a few cars. With split and merge strategy, both YOLOv3 and NEOON have shown a great detection performance especially for NEOON which is better at detecting indistinguishable small objects; in (<b>c</b>), the same as the cases in (<b>a</b>,<b>b</b>), the proposed NEOON has achieved the best results in ship detection.</p> "> Figure 11
<p>Some instances of detection results obtained by NEOON with three categories ((<b>a</b>) airplane; (<b>b</b>) car; (<b>c</b>) ship).</p> "> Figure 12
<p>Some instances of detection results obtained by NEOON in 10 categories.</p> "> Figure 13
<p>Subjective and objective effect. In (<b>a</b>), objects in a remote sensing image become more and more blurred as the resolution of the image decreases. In (<b>b</b>), AP and mAP curves (IoU = 0.5) of 3 categories as red, green and blue line for airplane, car and ship. As shown, AP and mAP rise with the increase in image resolution.</p> ">
Abstract
:1. Introduction
- In satellite imagery objects we are interested in, such as ships [49,50], are often densely arranged [51] and may appear as merely several pixels [52,53,54] (see Figure 1), rather than the large and prominent subjects in general object data such as Microsoft COCO [26]. For some objects such as cars, each object can be only 15 pixels at the highest resolution.
- Training data of high quality is insufficient. Only a small number of well-labelled geospatial images are publicly available. In addition, the quantity and quality of remote-sensing images have undergone rapid development and made great progress, which demands fast and effective approaches to real-time object localization [33].
- The geospatial images are different from general object images captured in ordinary life. Objects viewed from overhead can appear as multi-scale with any orientation such as airplanes [55,56,57] in an airport. Besides, the changing illumination, unusual aspect ratios and complex backgrounds make the detection difficult.
- We validate the characteristic of small objects feature in geospatial images when the deep CNN working, and then propose the main idea, making the best use of small object information in the forepart of the network, to copy with remote sensing detection tasks.
- We propose a novel one-stage detection framework named NEOON with a satisfactory performance for detecting densely arranged small objects in remote sensing imagery. NEOON focuses on extracting spatial information of high-resolution remote sensing images by understanding and analyzing the combination of feature and semantic information of small objects.
- The Focal Loss [60] is introduced in darknet as the loss function for classification to address the problem of class imbalance which is the main reason leading to the phenomenon that two-stage methods always outperform one-stage methods in detection accuracy.
- For densely arranged objects, we make use of the Soft-NMS [61] in post-processing and modify the code to make it suitable for the Darknet framework to preserve accurate bounding boxes in the post-processing stage.
- Abundant datasets and sufficient experiments are adopted and executed, respectively. On the one hand, experiments are conducted on both the ACS dataset and the NWPU VHR-10 dataset. On the other hand, the design of experiments and analysis of results are so thorough that the effectiveness of NEOON has been provenly validated. Specifically, we obtained the Precision as well as the Recall, and discuss the influence of image resolution on detection performance.
- The split and merge strategy, as well as the multi-scale training, are employed and do make sense in this work. To ensure that NEOON works smoothly and efficaciously, we have updated the C library of Darknet [62] by modifying a considerable part of C code as well as used quite a lot of script codes written in Python.
2. Proposed Method
- Feature extraction. The backbone of NEOON undertakes the task of feature extraction which will directly affect the final performance. As a special partly symmetrical architecture, it achieves bottom-up and corresponding top-down processing with several residual modules [40] adopted to accelerate and optimize the fitting process of NEOON model.
- Feature fusion. Concatenation operations and subsequent convolutional operations are carried out for feature maps, all four parallel, to implement feature fusion across the backbone to effectively combine the low-level and high-level features.
- Feature enhancement. We construct an RFE module in according to RFBNet [63] and incorporate it into NEOON. It is located at the forepart of backbone to especially enhance feature information of small objects of interested.
- Multi-scale detectors. Four detectors with different sensitivities, set all four parallel, play a vital role in capturing and utilizing features of objects in different scales.
- Focal loss. We introduce the Focal Loss [60] as the loss function of classification because it has been proved helpful to improve the performance of the one-stage methods by settling the class imbalance problem.
- Post-processing. The soft non-maximum suppression (Soft-NMS) [61] has been utilized in the post-processing procedure to filtrate bounding boxes more reasonably to improve the detection accuracy, especially for densely arranged objects.
- Implementation strategy. The split and merge strategy, as well as multi-scale training, are employed because the sizes of images and objects are too enormous and varying, respectively.
2.1. Feature Analysis
2.2. Neoon Network
2.2.1. Feature Extraction
2.2.2. Feature Fusion
2.2.3. Feature Enhancement
2.2.4. Multi-Scale Detection
2.3. Model Training
2.3.1. Overview
2.3.2. Regression
2.3.3. Classification
2.4. Post-Processing
3. Experimental Settings and Implementation Details
3.1. Dataset
3.1.1. Acs Dataset
- Images in ACS dataset are collected with multiple resolutions and viewpoints leading to multiple scales and angles respectively of similar objects.
- Objects of these three classes occupying fewer pixels than other classes such as bridges or basketball courts and so on.
3.1.2. Nwpu Vhr-10 Dataset
3.2. Baseline Method and Compared Methods
- CPOD, which is made up of 45 seed-based part detectors. Each part detector is a linear support vector machine (SVM) classifier and corresponds to a particular viewpoint of an object class, therefore the collection of them providing a solution for rotation-invariant object detection.
- YOLOv2, in which anchor priors and multi-scale training techniques are applied to predict location candidates. The Darknet-19 is used to extract object features, which has 19 convolutional layers, 5 max-pooling layers, and no fully connected layers.
- RICNN, which is achieved by learning a new rotation-invariant layer on the basis of the Alexnet to deal with the problem of object rotation variations.
- SSD, in which small convolutional filters are applied to each feature map to predict box offsets and category scores rather than fully connected layers in region-based methods. Additionally, SSD uses multi-representation that detect objects with different scales and aspect ratios.
- R-P-Faster R-CNN, which integrates the region proposal network and classification procedure through sharing the convolutional weights.
3.3. Implementation Details
3.3.1. Split and Merge Strategy
3.3.2. Multi-Scale Training Strategy
4. Experimental Results and Analysis
4.1. Results and Analysis on ACS
4.2. Results and Analysis on Nwpu Vhr-10
4.3. Fine-Grained Feature Impact Analysis
4.4. Discussion
- About the Soft-NMS. As demonstrated in Section 2.4, the Soft-NMS does works in specific situations where objects are arranged densely, such as when square boxes are predicted to detect obliquely and tightly aligned cars in Figure 8. However, it just plays a limited role in improvement on performance if the objects of interested are not densely arranged, which is the more general case. So we can consider utilizing the Soft-NMS in post-processing under the specific rather than all the circumstances.
- About the RFE module. In experiments, the RFE module does work and improves both the subjective and objective effect. However, we have found that in some test images of small and large objects coexisting, the RFE module raised the recall value of small object while making some large objects undetected, which needs further investigation to be found out.
- About the Darknet framework. As we can see in the Table 3, it can be found that the AP value of two class, the tennis court and basketball court, is much lower than SSD and R-P-Faster R-CNN, which is Similarly to YOLOv2 adopting the Darknet as its basic framework just like NEOON. Therefore, we suppose that this issue is related to the algorithm mechanism of the Darknet framework to some extent.
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Xie, W.; Shi, Y.; Li, Y.; Jia, X.; Lei, J. High-Quality Spectral-Spatial Reconstruction Using Saliency Detection and Deep Feature Enhancement. Pattern Recognit. 2019, 88, 139–152. [Google Scholar] [CrossRef]
- Xie, W.; Jiang, T.; Li, Y.; Jia, X.; Lei, J. Structure Tensor and Guided Filtering-Based Algorithm for Hyperspectral Anomaly Detection. IEEE Trans. Geosci. Remote Sens. 2019, 1–13. [Google Scholar] [CrossRef]
- Wang, Q.; Meng, Z.; Li, X. Locality Adaptive Discriminant Analysis for Spectral-Spatial Classification of Hyperspectral Images. IEEE Geosci. Remote Sens. Lett. 2017, 14, 2077–2081. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J. A Survey on Object Detection in Optical Remote Sensing Images. ISPRS J. Photogramm. Remote Sens. 2016, 117, 11–28. [Google Scholar] [CrossRef]
- Etten, A.V. You Only Look Twice: Rapid Multi-Scale Object Detection In Satellite Imagery. arXiv 2018, arXiv:1805.09512. [Google Scholar]
- Hu, Y.; Chen, J.; Pan, D.; Hao, Z. Edge-Guided Image Object Detection in Multiscale Segmentation for High-Resolution Remotely Sensed Imagery. IEEE Trans. Geosci. Remote Sens. 2016, 54, 4702–4711. [Google Scholar] [CrossRef]
- Qiu, S.; Wen, G.; Fan, Y. Occluded Object Detection in High-Resolution Remote Sensing Images Using Partial Configuration Object Model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 1909–1925. [Google Scholar] [CrossRef]
- Wang, Q.; Gao, J.; Yuan, Y. Embedding Structured Contour and Location Prior in Siamesed Fully Convolutional Networks for Road Detection. IEEE Trans. Intell. Transp. Syst. 2018, 19, 230–241. [Google Scholar] [CrossRef]
- Peng, X.; Feng, J.; Xiao, S.; Yau, W.; Zhou, J.T.; Yang, S. Structured AutoEncoders for Subspace Clustering. IEEE Trans. Image Process 2018, 27, 5076–5086. [Google Scholar] [CrossRef]
- Debnath, S.; Chinthavali, M. Multiple Marginal Fisher Analysis. IEEE Trans. Ind. Electron. 2018, 65, 9215–9224. [Google Scholar] [CrossRef]
- Hwang, K.C. A Modified Sierpinski Fractal Antenna for Multiband Application. IEEE Antennas Wirel. Propag. Lett. 2007, 6, 357–360. [Google Scholar] [CrossRef]
- Guido, R.C. Practical and Useful Tips on Discrete Wavelet Transforms. IEEE Signal Process. Mag. 2015, 32, 162–166. [Google Scholar] [CrossRef]
- Guariglia, E. Entropy and Fractal Antennas. Entropy 2016, 18, 84. [Google Scholar] [CrossRef]
- Guariglia, E. Harmonic Sierpinski Gasket and Applications. Entropy 2018, 20, 714. [Google Scholar] [CrossRef]
- Hutchinson, J.E. Fractals and Self Similarity. Indiana Univ. Math. J. 1981, 30, 713–747. [Google Scholar] [CrossRef]
- Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.W.; Chen, J.; Liu, X.; Pietikainen, M. Deep Learning for Generic Object Detection: A Survey. arXiv 2019, arXiv:1809.02165. [Google Scholar]
- Chahal, S.K.; Dey, K. A Survey of Modern Object Detection Literature Using Deep Learning. arXiv 2018, arXiv:1808.07256. [Google Scholar]
- Girshick, R.; Donahuea, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 24–27 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 13–16 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 1137–1149. [Google Scholar] [CrossRef]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-FCN: Object Detection via Region-Based Fully Convolutional Networks. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Barcelona, Spain, 5–10 December 2016; pp. 2294–2298. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.; Berg, A.C. SSD: Single Shot Multibox Detector. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar]
- Lin, T.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollar, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Yang, X.; Fu, K.; Sun, H.; Sun, X.; Yan, M.; Diao, W.; Guo, Z. Object Detection with Head Direction in Remote Sensing Images Based on Rotational Region CNN. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Valencia, Spain, 22–27 July 2018; pp. 2507–2510. [Google Scholar]
- Wu, Z.; Gao, Y.; Li, L.; Fan, J. Research on Object Detection Technique in High Resolution Remote Sensing Images Based on U-Net. In Proceedings of the Chinese Control And Decision Conference (CCDC), Shenyang, China, 9–11 June 2018; pp. 2849–2853. [Google Scholar]
- Chan-Hon-Tong, A.; Audebert, N. Object Detection in Remote Sensing Images with Center Only. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Valencia, Spain, 22–27 July 2018; pp. 7054–7057. [Google Scholar]
- Li, Q.; Mou, L.; Jiang, K.; Liu, Q.; Wang, Y.; Zhu, X. Hierarchical Region Based Convolution Neural Network for Multiscale Object Detection in Remote Sensing Images. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Valencia, Spain, 22–27 July 2018; pp. 4355–4358. [Google Scholar]
- Tayara, H.; Chong, K.T. Object Detection in Very High-Resolution Aerial Images Using One-Stage Densely Connected Feature Pyramid Network. Sensors 2018, 18, 3341. [Google Scholar] [CrossRef]
- Zou, Z.; Shi, Z. Random Access Memories: A New Paradigm for Target Detection in High Resolution Aerial Remote Sensing Images. IEEE Trans. Image Process. 2018, 27, 1100–1111. [Google Scholar] [CrossRef] [PubMed]
- Zhuang, S.; Wang, P.; Jiang, B.; Wang, G.; Wang, C. A Single Shot Framework with Multi-Scale Feature Fusion for Geospatial Object Detection. Remote Sens. 2019, 11, 594. [Google Scholar] [CrossRef]
- Ma, W.; Guo, Q.; Wu, Y.; Zhao, W.; Zhang, X.; Jiao, L. A Novel Multi-Model Decision Fusion Network for Object Detection in Remote Sensing Images. Remote Sens. 2019, 11, 737. [Google Scholar] [CrossRef]
- Zhang, X.; Zhu, K.; Chen, G.; Tan, X.; Zhang, L.; Dai, F.; Liao, P.J.; Gong, Y. Geospatial Object Detection on High Resolution Remote Sensing Imagery Based on Double Multi-Scale Feature Pyramid Network. Remote Sens. 2019, 11, 755. [Google Scholar] [CrossRef]
- Li, J.; Dai, Y.; Li, C.; Shu, J.; Li, D.; Yang, T.; Lu, Z. Visual Detail Augmented Mapping for Small Aerial Target Detection. Remote Sens. 2019, 11, 14. [Google Scholar] [CrossRef]
- Chen, S.; Zhan, R.; Zhang, J. Geospatial Object Detection in Remote Sensing Imagery Based on Multiscale Single-Shot Detector with Activated Semantics. Remote Sens. 2018, 10, 820. [Google Scholar] [CrossRef]
- Guo, W.; Yang, W.; Zhang, H.; Hua, G. Geospatial Object Detection in High Resolution Satellite Images Based on Multi-Scale Convolutional Neural Network. Remote Sens. 2018, 10, 131. [Google Scholar] [CrossRef]
- Han, X.; Zhong, Y.; Zhang, L. An Efficient and Robust Integrated Geospatial Object Detection Framework for High Spatial Resolution Remote Sensing Imagery. Remote Sens. 2017, 9, 666. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Zhong, Y.; Han, X.; Zhang, L. Multi-Class Geospatial Object Detection Based on A Position-Sensitive Balancing Framework for High Spatial Resolution Remote Sensing Imagery. ISPRS J. Photogramm. Remote Sens. 2018, 138, 281–294. [Google Scholar] [CrossRef]
- Shrivastava, A.; Gupta, A.; Girshick, R. Training Region-Based Object Detectors with Online Hard Example Mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 761–769. [Google Scholar]
- Ding, P.; Zhang, Y.; Deng, W.; Jia, P.; Kuijper, A. A Light and Faster Regional Convolutional Neural Network for Object Detection in Optical Remote Sensing Images. ISPRS J. Photogramm. Remote Sens. 2018, 141, 208–218. [Google Scholar] [CrossRef]
- Long, Y.; Gong, Y.; Xiao, Z.; Liu, Q. Accurate Object Localization in Remote Sensing Images Based on Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2486–2498. [Google Scholar] [CrossRef]
- Li, K.; Cheng, G.; Bu, S.; You, X. Rotation-Insensitive and Context-Augmented Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2337–2348. [Google Scholar] [CrossRef]
- Lin, T.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 734–750. [Google Scholar]
- Newell, A.; Yang, K.; Deng, J. Stacked Hourglass Networks for Human Pose Estimation. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 483–499. [Google Scholar]
- Liu, W.; Ma, L.; Chen, H. Arbitrary-Oriented Ship Detection Framework in Optical Remote-Sensing Images. IEEE Geosci. Remote Sens. Lett. 2018, 15, 937–941. [Google Scholar] [CrossRef]
- Yang, F.; Xu, Q.; Li, B.; Ji, Y. Ship Detection From Thermal Remote Sensing Imagery Through Region-Based Deep Forest. IEEE Geosci. Remote Sens. Lett. 2017, 15, 449–453. [Google Scholar] [CrossRef]
- Deng, Z.; Lei, L.; Sun, H.; Zou, H.; Zhou, S.; Zhao, J. An Enhanced Deep Convolutional Neural Network for Densely Packed Objects Detection in Remote Sensing Images. In Proceedings of the International Workshop on Remote Sensing with Intelligent Processing (RSIP), Shanghai, China, 18–21 May 2017; pp. 1–4. [Google Scholar]
- Zhang, W.; Wang, S.; Thachan, S.; Chen, J.; Qian, Y. Deconv R-CNN for Small Object Detection on Remote Sensing Images. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Valencia, Spain, 22–27 July 2018; pp. 2483–2486. [Google Scholar]
- Li, J.; Liang, X.; Wei, Y.; Xu, T.; Feng, J.; Yan, S. Perceptual Generative Adversarial Networks for Small Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1222–1230. [Google Scholar]
- Chen, C.; Liu, M.; Tuzel, O.; Xiao, J. R-CNN for Small Object Detection. In Proceedings of the 13th Asian Conference on Computer Vision (ACCV), Taipei, Taiwan, 20–24 November 2016. [Google Scholar]
- Cai, B.; Jiang, Z.; Zhang, H.; Yao, Y.; Nie, S. Online Exemplar-Based Fully Convolutional Network for Aircraft Detection in Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1095–1099. [Google Scholar] [CrossRef]
- Budak, U.; Sengur, A.; Halici, U. Deep Convolutional Neural Networks for Airport Detection in Remote Sensing Images. In Proceedings of the Signal Processing and Communications Applications Conference (SIU), Cesme-Izmir, Turkey, 2–5 May 2018; pp. 1–4. [Google Scholar]
- Han, Z.; Zhang, H.; Zhang, J.; Hu, X. Fast Aircraft Detection Based on Region Locating Network in Large-Scale Remote Sensing Images. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 2294–2298. [Google Scholar]
- Cheng, G.; Zhou, P.; Han, J. Learning Rotation-Invariant Convolutional Neural Networks for Object Detection in VHR Optical Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J.; Zhou, P.; Guo, L. Multi-class Geospatial Object Detection and Geographic Image Classification Based on Collection of Part Detectors. ISPRS J. Photogramm. Remote Sens. 2014, 98, 119–132. [Google Scholar] [CrossRef]
- Lin, T.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar]
- Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Improving Object Detection with One Line of Code. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5562–5570. [Google Scholar]
- Redmon, J. Darknet: Open Source Neural Networks in C. 2013–2016. Available online: http://pjreddie.com/darknet/ (accessed on 5 June 2019).
- Liu, S.; Huang, D.; Wang, Y. Receptive Field Block Net for Accurate and Fast Object Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 385–400. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar]
- Xia, G.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 19–21 June 2018; pp. 3974–3983. [Google Scholar]
Dataset | Airplane | Ship | Car |
---|---|---|---|
DOTA | 2933 | 6886 | 456 |
UCAS-AOD | - | - | 3791 |
NWPU VHR-10 | 754 | - | 596 |
RSOD | 5374 | - | - |
LEVIR | 3967 | 2627 | - |
ASC | 13,082 | 9513 | 4843 |
Method | Object Category | mAP | mRecall | |||||
---|---|---|---|---|---|---|---|---|
Airplane | Car | Ship | ||||||
AP | Recall | AP | Recall | AP | Recall | |||
YOLOv3 | 71.55% | 75.73% | 48.91% | 71.82% | 54.17% | 71.95% | 58.21% | 73.17% |
YOLOv3+split | 85.98% | 86.77% | 90.58% | 93.60% | 69.10% | 81.19% | 81.88% | 87.19% |
D: C–SoftNMS | 87.95% | 91.11% | 91.38% | 93.34% | 73.01% | 83.31% | 84.11% | 89.25% |
C: B–FocalLoss | 88.65% | 92.24% | 91.54% | 94.23% | 72.88% | 84.35% | 84.36% | 90.27% |
B: A–RFEmodule | 89.36% | 94.14% | 92.07% | 96.07% | 74.91% | 86.44% | 85.45% | 92.22% |
A: NEOON+split | 94.49% | 95.37% | 93.22% | 96.87% | 72.25% | 85.83% | 86.65% | 92.69% |
Methods | COPD | YOLOv2 | RICNN | SSD | R-P-Faster R-CNN | NEOON |
---|---|---|---|---|---|---|
Airplane | 62.3% | 73.3% | 88.4% | 95.7% | 90.4% | 78.29% |
Ship | 68.9% | 74.9% | 77.3% | 82.9% | 75.0% | 81.68% |
Storage Tank | 63.7% | 34.4% | 85.3% | 85.6% | 44.4% | 94.62% |
Baseball Diamond | 83.3% | 88.9% | 88.1% | 96.6% | 89.9% | 89.74% |
Tennis Court | 32.1% | 29.1% | 40.8% | 82.1% | 79.7% | 61.25% |
Basketball Court | 36.3% | 27.6% | 58.5% | 86.0% | 77.6% | 65.04% |
Ground Track Field | 85.3% | 98.8% | 86.7% | 58.2% | 87.7% | 93.23% |
Harbor | 55.3% | 75.4% | 68.6% | 54.8% | 79.1% | 73.15% |
Bridge | 14.8% | 51.8% | 61.5% | 41.9% | 68.2% | 59.46% |
Vehicle | 44.0% | 51.3% | 71.1% | 75.6% | 73.2% | 78.26% |
mAP | 54.6% | 60.5% | 72.6% | 75.9% | 76.5% | 77.5% |
Average Running Time (s) | 1.070 | 0.026 | 8.770 | 0.027 | 0.150 | 0.059 |
Resolution | AP | mAP | ||
---|---|---|---|---|
Airplane | Car | Ship | ||
Original | 94.49% | 93.22% | 72.25% | 86.65% |
0.8× | 84.22% | 87.53% | 79.96% | 83.90% |
0.6× | 78.25% | 88.18% | 74.65% | 80.36% |
0.4× | 64.88% | 70.02% | 60.21% | 65.04% |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xie, W.; Qin, H.; Li, Y.; Wang, Z.; Lei, J. A Novel Effectively Optimized One-Stage Network for Object Detection in Remote Sensing Imagery. Remote Sens. 2019, 11, 1376. https://doi.org/10.3390/rs11111376
Xie W, Qin H, Li Y, Wang Z, Lei J. A Novel Effectively Optimized One-Stage Network for Object Detection in Remote Sensing Imagery. Remote Sensing. 2019; 11(11):1376. https://doi.org/10.3390/rs11111376
Chicago/Turabian StyleXie, Weiying, Haonan Qin, Yunsong Li, Zhuo Wang, and Jie Lei. 2019. "A Novel Effectively Optimized One-Stage Network for Object Detection in Remote Sensing Imagery" Remote Sensing 11, no. 11: 1376. https://doi.org/10.3390/rs11111376
APA StyleXie, W., Qin, H., Li, Y., Wang, Z., & Lei, J. (2019). A Novel Effectively Optimized One-Stage Network for Object Detection in Remote Sensing Imagery. Remote Sensing, 11(11), 1376. https://doi.org/10.3390/rs11111376