Object Detection in Very High-Resolution Aerial Images Using One-Stage Densely Connected Feature Pyramid Network
<p>Comparison between the scales of the objects in natural images given by COCO dataset (<b>a</b>) and the scale of the objects in VHR aerial images given by NWPU VHR-10 dataset (<b>b</b>). It can be seen that the vehicles in natural images occupy a larger area compared with the vehicles in VHR aerial images.</p> "> Figure 2
<p>The overall architecture of the proposed model.</p> "> Figure 3
<p>The architecture of the densely connected feature pyramid network.</p> "> Figure 4
<p>The architecture of classification and regression heads.</p> "> Figure 5
<p>Examples of data augmentation technique. First row represents the input images while the second and third rows represent the augmented output.</p> "> Figure 6
<p>Detection results of the proposed model in terms of AP using different backbones: VGG-16, Resnet 50, and Resnet 101.</p> "> Figure 7
<p>Comparison of area under precision-recall curve with different state-of-the-art models.</p> "> Figure 8
<p>Some object detection results from NWPU VHR-10 dataset. Yellow, red, and blue colors represent true positive, false negative, and false positive cases, respectively. (<b>a</b>) airplane, (<b>b</b>) ship, (<b>c</b>) storage tank, (<b>d</b>) baseball diamond, (<b>e</b>) tennis court, (<b>f</b>) basketball court, (<b>g</b>) ground track field, (<b>h</b>) harbor, (<b>i</b>) bridge, (<b>j</b>) vehicle, (<b>k</b>–<b>o</b>) show some false positive and false negative cases.</p> "> Figure 9
<p>Some object detection results from RSOD dataset. Yellow, red, and blue colors represent true positive, false negative, and false positive cases, respectively. (<b>a</b>–<b>c</b>) show examples of true positive detection of oil tank, (<b>d</b>–<b>f</b>) show examples of true positive detection of overpass, (<b>g</b>–<b>i</b>) show examples of true positive detection of playground, (<b>j</b>–<b>l</b>) show examples of true positive detection of aircraft, and (<b>m</b>–<b>o</b>) show examples of false positive and false negative cases.</p> ">
Abstract
:1. Introduction
2. Related Works
3. Methodology
3.1. The Proposed Model
3.2. Loss Function
3.2.1. Bounding Box Regression Loss Function
3.2.2. Classification Loss Function
3.3. Implementation Details
4. Experimental Results
4.1. Datasets Description
4.2. Evaluation Metrics
4.2.1. Precision-Recall Curve
4.2.2. Average Precision
4.3. Results
- Bag of Words (BoW) [36]: This work utilized K-mean algorithm for generating histogram of visual words by which each image region is represented.
- Spatial Sparse Coding BoW (SSCBoW) [35]: This work utilized sparse coding algorithm for generating visual words.
- The Collection of Part Detector (COPD) [37]: This method utilized 45 seed-part SVM linear detectors. They were trained on the feature extracted by HOG and resulted in a rotation-invariant object detection model.
- Rotation-invariant CNN (RICNN) [29]: This work added a new layer to Alexnet for dealing with rotated objects.
- Faster R-CNN [22]: It is a two-stage object detection CNN. The first stage proposes a set of objects whereas the second stage classifies them.
- Single Shot Multibox Detector (SSD) [24]: It is a uniform one-stage model that utilizes the feature maps at different scales.
- Rotation-insensitive CNN [48]: This work proposed context-augmented feature fusion model and RPN with multi-angle anchors.
- Deformable CNN [46]: This work proposed a deformable region-based fully convolution layer by using a deformable convolution layer instead of the conventional one.
- Multi-Scale CNN [47]: In this work, feature maps with high semantic information at different scales were proposed.
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Colomina, I.; Molina, P. Unmanned aerial systems for photogrammetry and remote sensing: A review. ISPRS J. Photogramm. Remote Sens. 2014, 92, 79–97. [Google Scholar] [CrossRef]
- Zhang, F.; Du, B.; Zhang, L.; Xu, M. Weakly supervised learning based on coupled convolutional neural networks for aircraft detection. IEEE Trans. Geosci. Remote Sens. 2016, 54, 5553–5563. [Google Scholar] [CrossRef]
- Kamusoko, C. Importance of Remote Sensing and Land Change Modeling for Urbanization Studies; Springer: Singapore, 2017. [Google Scholar]
- Barrett, E. Introduction to Environmental Remote Sensing; Routledge: Abingdon, UK, 2003. [Google Scholar]
- Cheng, G.; Han, J.; Zhou, P.; Guo, L. Scalable multi-class geospatial object detection in high-spatial-resolution remote sensing images. In Proceedings of the 2014 IEEE Geoscience and Remote Sensing Symposium, Quebec City, QC, Canada, 13–18 July 2014; pp. 2479–2482. [Google Scholar]
- Tayara, H.; Soo, K.G.; Chong, K.T. Vehicle detection and counting in high-resolution aerial images using convolutional regression neural network. IEEE Access 2018, 6, 2220–2230. [Google Scholar] [CrossRef]
- Moranduzzo, T.; Melgani, F. Automatic car counting method for unmanned aerial vehicle images. IEEE Trans. Geosci. Remote Sens. 2014, 52, 1635–1647. [Google Scholar] [CrossRef]
- Moranduzzo, T.; Melgani, F. Detecting cars in uav images with a catalog-based approach. IEEE Trans. Geosci. Remote Sens. 2014, 52, 6356–6367. [Google Scholar] [CrossRef]
- Wen, X.; Shao, L.; Fang, W.; Xue, Y. Efficient feature selection and classification for vehicle detection. IEEE Trans. Circuits Syst. Video Technol. 2015, 25, 508–517. [Google Scholar]
- Yu, X.; Shi, Z. Vehicle detection in remote sensing imagery based on salient information and local shape feature. Optik-Int. J. Light Electron Opt. 2015, 126, 2485–2490. [Google Scholar] [CrossRef]
- Cai, H.; Su, Y. Airplane detection in remote sensing image with a circle-frequency filter. In Proceedings of the 2005 International Conference on Space information Technology, Wuhan, China, 19–20 November 2005. [Google Scholar]
- An, Z.; Shi, Z.; Teng, X.; Yu, X.; Tang, W. An automated airplane detection system for large panchromatic image with high spatial resolution. Optik-Int. J. Light Electron Opt. 2014, 125, 2768–2775. [Google Scholar] [CrossRef]
- Bo, S.; Jing, Y. Region-based airplane detection in remotely sensed imagery. In Proceedings of the 2010 3rd International Congress on Image and Signal Processing, Yantai, China, 16–18 October 2010. [Google Scholar]
- Sirmacek, B.; Unsalan, C. A probabilistic framework to detect buildings in aerial and satellite images. IEEE Trans. Geosci. Remote Sens. 2011, 49, 211–221. [Google Scholar] [CrossRef]
- Stankov, K.; He, D.C. Detection of buildings in multispectral very high spatial resolution images using the percentage occupancy hit-or-miss transform. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 4069–4080. [Google Scholar] [CrossRef]
- Zhang, L.; Shi, Z.; Wu, J. A hierarchical oil tank detector with deep surrounding features for high-resolution optical satellite imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 4895–4909. [Google Scholar] [CrossRef]
- Ok, A.O.; Başeski, E. Circular oil tank detection from panchromatic satellite images: A new automated approach. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1347–1351. [Google Scholar] [CrossRef]
- Dai, D.; Yang, W. Satellite image classification via two-layer sparse coding with biased image representation. IEEE Geosci. Remote Sens. Lett. 2011, 8, 173–176. [Google Scholar] [CrossRef]
- Zhang, D.; Meng, D.; Han, J. Co-saliency detection via a self-paced multiple-instance learning framework. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 865–878. [Google Scholar] [CrossRef] [PubMed]
- Tian, Y.; Cehn, C.; Shah, M. Cross-view image matching for geo-localization in urban environments. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Girshick, R. Fast R-CNN. Available online: https://www.cv-foundation.org/openaccess/content_iccv_2015/html/Girshick_Fast_R-CNN_ICCV_2015_paper.html (accessed on 4 October 2018).
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet Classification with Deep Convolutional Neural Networks. Available online: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks (accessed on 4 October 2018).
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single Shot Multibox Detector. Available online: https://link.springer.com/chapter/10.1007%2F978-3-319-46448-0_2 (accessed on 4 October 2018).
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Everingham, M.; Ali Eslami, S.M.; Gool, L.V.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes Challenge: A Retrospective. Available online: https://link.springer.com/article/10.1007/s11263-014-0733-5 (accessed on 4 October 2018).
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Computer Vision—ECCV 2014; Springer: Berlin, Germany, 2014. [Google Scholar]
- Cheng, G.; Han, J. A survey on object detection in optical remote sensing images. ISPRS J. Photogramm. Remote Sens. 2016, 117, 11–28. [Google Scholar] [CrossRef] [Green Version]
- Cheng, G.; Zhou, P.; Han, J. Learning rotation-invariant convolutional neural networks for object detection in vhr optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
- Qu, T.; Zhang, Q.; Sun, S. Vehicle detection from high-resolution aerial images using spatial pyramid pooling-based deep convolutional neural networks. Multimedia Tools Appl. 2017, 76, 21651–21663. [Google Scholar] [CrossRef]
- Tang, T.; Zhou, S.; Deng, Z.; Zou, H.; Lei, L. Vehicle detection in aerial images based on region convolutional neural networks and hard negative example mining. Sensors 2017, 17, 336. [Google Scholar] [CrossRef] [PubMed]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. Available online: https://arxiv.org/abs/1409.1556 (accessed on 4 October 2018).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Han, J.; Zhou, P.; Zhang, D.; Cheng, G.; Guo, L.; Liu, Z.; Bu, S.; Wu, J. Efficient, simultaneous detection of multi-class geospatial targets based on visual saliency modeling and discriminative learning of sparse coding. ISPRS J. Photogramm. Remote Sens. 2014, 89, 37–48. [Google Scholar] [CrossRef]
- Sun, H.; Sun, X.; Wang, H.; Li, Y.; Li, X. Automatic target detection in high-resolution remote sensing images using spatial sparse coding bag-of-words model. IEEE Geosci. Remote Sens. Lett. 2012, 9, 109–113. [Google Scholar] [CrossRef]
- Xu, S.; Fang, T.; Li, D.; Wang, S. Object classification of aerial images with bag-of-visual words. IEEE Geosci. Remote Sens. Lett. 2010, 7, 366–370. [Google Scholar]
- Cheng, G.; Han, J.; Zhou, P.; Guo, L. Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J. Photogramm. Remote Sens. 2014, 98, 119–132. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal loss for dense object detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Yang, Y.; Zhuang, Y.; Bi, F.; Shi, H.; Xie, Y. M-fcn: Effective fully convolutional network-based airplane detection framework. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1293–1297. [Google Scholar] [CrossRef]
- Han, J.; Zhang, D.; Cheng, G.; Guo, L.; Ren, J. Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans. Geosci. Remote Sens. 2015, 53, 3325–3337. [Google Scholar] [CrossRef]
- Jun, G.; Ghosh, J. Semisupervised learning of hyperspectral data with unknown land-cover classes. IEEE Trans. Geosci. Remote Sens. 2013, 51, 273–282. [Google Scholar] [CrossRef]
- Chen, C.; Gong, W.; Hu, Y.F.; Chen, Y.; Ding, Y.S. Learning Oriented Region-Based Convolutional Neural Networks for Building Detection in Satellite Remote Sensing Images. Available online: https://pdfs.semanticscholar.org/c549/a290c5f3efca6d91d698696d307b32ba251f.pdf (accessed on 4 October 2018).
- Kampffmeyer, M.; Salberg, A.B.; Jenssen, R. Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Wegner, J.D.; Branson, S.; Hall, D.; Schindler, K.; Perona, P. Cataloging public objects using aerial and street-level images x2014; urban trees. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Xu, Z.; Xu, X.; Wang, L.; Yang, R.; Pu, F. Deformable convnet with aspect ratio constrained nms for object detection in remote sensing imagery. Remote Sens. 2017, 9, 12. [Google Scholar] [CrossRef]
- Guo, W.; Yang, W.; Zhang, H.; Hua, G. Geospatial object detection in high resolution satellite images based on multi-scale convolutional neural network. Remote Sens. 2018, 10, 1. [Google Scholar] [CrossRef]
- Li, K.; Cheng, G.; Bu, S.; You, X. Rotation-insensitive and context-augmented object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2337–2348. [Google Scholar] [CrossRef]
- Fizyr/Keras-Retinanet. Available online: https://github.com/fizyr/keras-retinanet (accessed on 4 October 2018).
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 3. [Google Scholar] [CrossRef]
- Long, Y.; Gong, Y.; Xiao, Z.; Liu, Q. Accurate object localization in remote sensing images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2486–2498. [Google Scholar] [CrossRef]
- Han, X.; Zhong, Y.; Zhang, L. An efficient and robust integrated geospatial object detection framework for high spatial resolution remote sensing imagery. Remote Sens. 2017, 9, 7. [Google Scholar] [CrossRef]
Class | # Instances |
---|---|
airplane | 757 |
ship | 302 |
storage tank | 655 |
baseball diamonds | 390 |
tennis courts | 524 |
basketball court | 159 |
ground track filed | 163 |
harbors | 224 |
bridge | 124 |
vehicle | 477 |
Method | Air Plane | Ship | Storage Tank | Baseball Diamond | Tennis Court | Basketball Court | Ground Track Field | Harbor | Bridge | Vehicle | mAP |
---|---|---|---|---|---|---|---|---|---|---|---|
BoW | 0.2496 | 0.5849 | 0.6318 | 0.0903 | 0.0472 | 0.0322 | 0.0777 | 0.5298 | 0.1216 | 0.0914 | 0.2457 |
SSC BoW | 0.5061 | 0.5084 | 0.3337 | 0.4349 | 0.0033 | 0.1496 | 0.1007 | 0.5833 | 0.1249 | 0.3361 | 0.3081 |
COPD | 0.6225 | 0.6887 | 0.6371 | 0.8327 | 0.3208 | 0.3625 | 0.8531 | 0.5527 | 0.1479 | 0.4403 | 0.5458 |
Transferred CNN | 0.661 | 0.569 | 0.843 | 0.816 | 0.35 | 0.459 | 0.8 | 0.62 | 0.423 | 0.429 | 0.597 |
RICNN | 0.8835 | 0.7734 | 0.8527 | 0.8812 | 0.4083 | 0.5845 | 0.8673 | 0.686 | 0.6151 | 0.711 | 0.7263 |
SSD | 0.957 | 0.829 | 0.856 | 0.966 | 0.821 | 0.86 | 0.582 | 0.548 | 0.419 | 0.756 | 0.7594 |
Faster R-CNN | 0.946 | 0.823 | 0.6532 | 0.955 | 0.819 | 0.897 | 0.924 | 0.724 | 0.575 | 0.778 | 0.8094 |
Deformable CNN | 0.873 | 0.814 | 0.636 | 0.904 | 0.816 | 0.741 | 0.903 | 0.753 | 0.714 | 0.755 | 0.7909 |
Rotation-Insensitive CNN | 0.997 | 0.908 | 0.9061 | 0.9291 | 0.9029 | 0.8013 | 0.9081 | 0.8029 | 0.6853 | 0.8714 | 0.8712 |
Multi-Scale CNN | 0.993 | 0.92 | 0.832 | 0.972 | 0.908 | 0.926 | 0.981 | 0.851 | 0.719 | 0.859 | 0.8961 |
Ours (VGG-16) | 0.9977 | 0.926 | 0.8652 | 0.9689 | 0.9839 | 0.7997 | 0.9752 | 0.8846 | 0.8111 | 0.8514 | 0.9063 |
Ours (Resnet 50) | 0.971 | 0.9361 | 0.7958 | 0.9628 | 0.9424 | 0.9149 | 0.998 | 0.9071 | 0.782 | 0.8315 | 0.9042 |
Ours (Resnet 101) | 0.9906 | 0.9182 | 0.842 | 0.9459 | 0.9263 | 0.8503 | 0.9839 | 0.9381 | 0.9142 | 0.8359 | 0.9146 |
Methods | Average Running Time per Image (s) |
---|---|
BoW | 5.32 |
SSC BoW | 40.32 |
COPD | 1.07 |
Transferred CNN | 5.24 |
RICNN | 8.77 |
SSD | 0.09 |
Faster R-CNN | 0.16 |
Deformable CNN | 0.201 |
Rotation-Insensitive CNN | 2.89 |
Multi-Scale CN | 0.11 |
Ours (Resnet 101) | 0.088 |
Method | Aircraft | Oil Tank | Overpass | Playground | mAP |
---|---|---|---|---|---|
R-P-Faster R-CNN | 0.7084 | 0.9019 | 0.7874 | 0.9809 | 0.8447 |
Deformable R-FCN (ResNet-101) | 0.7150 | 0.9026 | 0.8148 | 0.9953 | 0.8570 |
Deformable R-FCN (ResNet-101) and arcNMS | 0.7187 | 0.9035 | 0.8959 | 0.9988 | 0.8792 |
Ours (VGG-16) | 0.8764 | 0.9712 | 0.9310 | 1.0 | 0.9447 |
Ours (Resnet 50) | 0.8576 | 0.9555 | 0.8528 | 0.9955 | 0.9153 |
Ours (Resnet 101) | 0.8625 | 0.9598 | 0.9467 | 0.9987 | 0.9419 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tayara, H.; Chong, K.T. Object Detection in Very High-Resolution Aerial Images Using One-Stage Densely Connected Feature Pyramid Network. Sensors 2018, 18, 3341. https://doi.org/10.3390/s18103341
Tayara H, Chong KT. Object Detection in Very High-Resolution Aerial Images Using One-Stage Densely Connected Feature Pyramid Network. Sensors. 2018; 18(10):3341. https://doi.org/10.3390/s18103341
Chicago/Turabian StyleTayara, Hilal, and Kil To Chong. 2018. "Object Detection in Very High-Resolution Aerial Images Using One-Stage Densely Connected Feature Pyramid Network" Sensors 18, no. 10: 3341. https://doi.org/10.3390/s18103341
APA StyleTayara, H., & Chong, K. T. (2018). Object Detection in Very High-Resolution Aerial Images Using One-Stage Densely Connected Feature Pyramid Network. Sensors, 18(10), 3341. https://doi.org/10.3390/s18103341