Real UAV-Bird Image Classification Using CNN with a Synthetic Dataset
<p>Schematic representation of data preparation (<b>left</b>), network training (<b>middle</b>) and testing (<b>right</b>) process for experiment 1 (<b>top</b>) and experiment 2 (<b>down</b>).</p> "> Figure 2
<p>Green box studio designed in the unity game engine environment: (<b>a</b>) Virtual camera used to take a picture. (<b>b</b>) RW-UAV 3D model. (<b>c</b>) Light sources to generate realistic pictures. (<b>d</b>) Green-colored background, wall view. (<b>e</b>) Example camera view.</p> "> Figure 3
<p>Modern and historical city views designed in the unity game engine—virtual environments: (<b>a</b>) Game engine terrain of all design. (<b>a1</b>) Modern city design. (<b>a2</b>) Old city design. (<b>b</b>–<b>g</b>) City views from different angles.</p> "> Figure 4
<p>The schematic representation related to the creation of a dataset of the RW-UAV class.</p> "> Figure 5
<p>Some examples of real bird and RW-UAV images. (<b>a</b>–<b>c</b>) Bird images; (<b>d</b>–<b>f</b>) RW-UAV images.</p> "> Figure 6
<p>Sample output images of the CDNTS algorithm. (<b>a</b>–<b>d</b>) synthetic RW-UAV and bird images samples for the training process; (<b>a1</b>–<b>d1</b>) transformed shape by CDNTS layer of a–d labeled images; (<b>e</b>,<b>f</b>) synthetic RW-UAV and bird images samples for the test process; (<b>e1</b>,<b>f1</b>) transformed shape by CDNTS layer of e,f labeled images; (<b>g</b>–<b>j</b>) real RW-UAV and bird images samples for the test process; (<b>g1</b>–<b>j1</b>) transformed shape by CDNTS layer of <b>g</b>–<b>j</b> labeled images.</p> "> Figure 7
<p>CDNTS layer, nearest three-point selection process schematic representation.</p> "> Figure 8
<p>Schematic representation of combinations of networks.</p> "> Figure 9
<p>Experiments 1 and 2: the precision and recall graph of the most successful networks of syntetic image test of squeezeNet with Adam optimizer (blue), real image test of SqueezeNet with Adam optimizer (orange), synthetic image test of CDNTS +AlexNet with SGD optimizer (green), real image test of CDNTS + SquezeeNet with RMSProp optimizer (red).</p> "> Figure 10
<p>Time cost comparison of experiments 1 and 2. Alexnet 55%, squeezenet 74%, and VGG16 70% are time cost of experiment 1. CDNTS + Alexnet 89.9%, CDNTS + squeezenet 88.9%, and CDNTS + VGG16 87.4% are time cost of experiment 2.</p> ">
Abstract
:1. Introduction
2. Related Work
3. Materials and Methods
3.1. Generate Synthetic Training Dataset
Algorithm 1: RW-UAV model rotation and synthetic image generator algorithm in the unity game engine. |
|
3.2. Real-Test Dataset Collection and Preparation
3.3. Train and Test Setup
3.4. Corner Detection and Nearest Three-Point Selection Layer (CDNTS)
Algorithm 2: Corner detection and nearest three-point selection layer. |
|
4. Result and Discussion
4.1. Training and Test Results
4.2. Computation Times
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
UAV | Unmanned aerial vehicle |
RW-UAV | Rotary-wing UAV |
FW-UAV | Fixed-wing UAV |
ML | Machine learning |
AI | Artificial intelligence |
DL | Deep learning |
CNN | Convolutional neural network |
ILSVRC | IMAGENET Large-Scale Visual Recognition Challenge |
RELU | Rectified linear unit |
VGG | Visual geometry group |
3D | Three-dimensional |
CDNTS | Corner detection and nearest three-point selection |
2D | Two-dimensional |
AUC | Area under the curve |
GPU | Graphics processing unit |
CAD | Computer-aided design |
RGB | Red–green–blue |
MD5 | Message-digest algorithm 5 |
SGD | Stochastic gradient descent |
RMSprop | Root mean square error probability |
GB | Gigabyte |
RAM | Random access memory |
References
- Kim, J.; Gadsden, S.A.; Wilkerson, S.A. A Comprehensive Survey of Control Strategies for Autonomous Quadrotors. Can. J. Electr. Comput. Eng. 2020, 43, 3–16. [Google Scholar] [CrossRef]
- Guan, X.; Lyu, R.; Shi, H.; Chen, J. A survey of safety separation management and collision avoidance approaches of civil UAS operating in integration national airspace system. Chin. J. Aeronaut. 2020, 33, 2851–2863. [Google Scholar] [CrossRef]
- Chen, S.; Yin, Y.; Wang, Z.; Gui, F. Low-altitude protection technology of anti-UAVs based on multisource detection information fusion. Int. J. Adv. Robot. Syst. 2020, 17. [Google Scholar] [CrossRef]
- Santos, I.; Castro, L.; Rodriguez-Fernandez, N.; Torrente-Patino, A.; Carballal, A. Artificial Neural Networks and Deep Learning in the Visual Arts: A review. Neural Comput. Appl. 2021, 33, 121–157. [Google Scholar] [CrossRef]
- Gambella, C.; Ghaddar, B.; Naoum-Sawaya, J. Optimization problems for machine learning: A survey. Eur. J. Oper. Res. 2021, 290, 807–828. [Google Scholar] [CrossRef]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. (IJCV) 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images CIFAR-10 Dataset. 2009. Available online: https://www.cs.toronto.edu/~kriz/cifar.html (accessed on 5 April 2021).
- Hong, S.J.; Han, Y.; Kim, S.Y.; Lee, A.Y.; Kim, G. Application of Deep-Learning Methods to Bird Detection Using Unmanned Aerial Vehicle Imagery. Sensors 2019, 19, 1651. [Google Scholar] [CrossRef] [Green Version]
- Wang, Y.; Deng, W.; Liu, Z.; Wang, J. Deep learning-based vehicle detection with synthetic image data. IET Intell. Transp. Syst. 2019, 13, 1097–1105. [Google Scholar] [CrossRef]
- Dogan, A.; Birant, D. Machine learning and data mining in manufacturing. Expert Syst. Appl. 2021, 166, 114060. [Google Scholar] [CrossRef]
- Hou, L.; Chen, H.; Zhang, G.K.; Wang, X. Deep Learning-Based Applications for Safety Management in the AEC Industry: A Review. Appl. Sci. 2021, 11, 821. [Google Scholar] [CrossRef]
- Seidaliyeva, U.; Akhmetov, D.; Ilipbayeva, L.; Matson, E.T. Real-Time and Accurate Drone Detection in a Video with a Static Background. Sensors 2020, 20, 3856. [Google Scholar] [CrossRef]
- Du, F.; Liu, P.; Zhao, W.; Tang, X. Correlation-Guided Attention for Corner Detection Based Visual Tracking. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 6835–6844. [Google Scholar] [CrossRef]
- Bai, Z.; Li, Y.; Chen, X.; Yi, T.; Wei, W.; Wozniak, M.; Damasevicius, R. Real-Time Video Stitching for Mine Surveillance Using a Hybrid Image Registration Method. Electronics 2020, 9, 1336. [Google Scholar] [CrossRef]
- Duo, J.; Zhao, L. An Asynchronous Real-Time Corner Extraction and Tracking Algorithm for Event Camera. Sensors 2021, 21, 1475. [Google Scholar] [CrossRef] [PubMed]
- Qiao, W.B.; Créput, J.C. Component-based 2-/3-dimensional nearest neighbor search based on Elias method to GPU parallel 2D/3D Euclidean Minimum Spanning Tree Problem. Appl. Soft Comput. 2021, 100, 106928. [Google Scholar] [CrossRef]
- Keivani, O.; Sinha, K. Random projection-based auxiliary information can improve tree-based nearest neighbor search. Inf. Sci. 2021, 546, 526–542. [Google Scholar] [CrossRef]
- Sasaki, Y. The Truth of the f-Measure. 2007. Available online: https://www.toyota-ti.ac.jp/Lab/Denshi/COIN/people/yutaka.sasaki/F-measure-YS-26Oct07.pdf (accessed on 5 April 2021).
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Available online: tensorflow.org (accessed on 5 April 2021).
- Chollet, F. Keras. 2015. Available online: https://keras.io (accessed on 5 April 2021).
- Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the NIPS’12: Proceedings of the 25th International Conference on Neural Information Processing Systems, Stateline, NV, USA, 3–8 December 2012. [Google Scholar]
- Iandola, F.; Han, S.; Moskewicz, M.; Ashraf, K.; Dally, W.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360v4. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
- Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O. Understanding Deep Learning Requires Rethinking Generalization. arXiv 2017, arXiv:1611.03530v2. [Google Scholar]
- Ros, G.; Sellart, L.; Materzynska, J.; Vazquez, D.; Lopez, A. The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Zhang, Q.; Lin, G.; Zhang, Y.; Xu, G.; Wang, J. Wildland Forest Fire Smoke Detection Based on Faster R-CNN using Synthetic Smoke Images. Procedia Eng. 2018, 211, 441–446. [Google Scholar] [CrossRef]
- Glowacz, A. Fault diagnosis of electric impact drills using thermal imaging. Measurement 2021, 171, 108815. [Google Scholar] [CrossRef]
- Li, J.; Yan, D.; Luan, K.; Li, Z.; Liang, H. Deep Learning-Based Bird’s Nest Detection on Transmission Lines Using UAV Imagery. Appl. Sci. 2020, 10, 6147. [Google Scholar] [CrossRef]
- Lawrence, S.; Giles, C.L.; Tsoi, A.C.; Back, A.D. Face recognition: A convolutional neural-network approach. IEEE Trans. Neural Netw. 1997, 8, 98–113. [Google Scholar] [CrossRef] [Green Version]
- Li, Y.; Su, H.; Qi, C.R.; Fish, N.; Cohen-Or, D.; Guibas, L.J. Joint Embeddings of Shapes and Images via CNN Image Purification. ACM Trans. Graph. 2015, 34. [Google Scholar] [CrossRef]
- Mamta, J.; Ridhima, S.; Sumindar Kaur, S.; Ravinder, K.; Divya, B.; Prashant, J. OCLU-NET for occlusal classification of 3D dental models. Mach. Vis. Appl. 2020, 3, 52. [Google Scholar] [CrossRef]
- Lin, W.; Gao, J.; Wang, Q.; Li, X. Learning to detect anomaly events in crowd scenes from synthetic data. Neurocomputing 2021, 436, 248–259. [Google Scholar] [CrossRef]
- Richter, S.R.; Vineet, V.; Roth, S.; Koltun, V. Playing for Data: Ground Truth from Computer Games. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; Volume 9906, pp. 102–118. [Google Scholar]
- Wang, Q.; Gao, J.; Lin, W.; Yuan, Y. Learning from Synthetic Data for Crowd Counting in the Wild. arXiv 2019, arXiv:1903.03303. [Google Scholar]
- Koirala, A.; Walsh, K.B.; Wang, Z. Attempting to Estimate the Unseen—Correction for Occluded Fruit in Tree Fruit Load Estimation by Machine Vision with Deep Learning. Agronomy 2021, 11, 347. [Google Scholar] [CrossRef]
- Morra, L.; Lamberti, F. Benchmarking unsupervised near-duplicate image detection. Expert Syst. Appl. 2019, 135, 313–326. [Google Scholar] [CrossRef] [Green Version]
- Unity. Unity Asset Store. Available online: https://assetstore.unity.com/ (accessed on 10 February 2021).
- Bradski, G. The OpenCV Library. Dr. Dobb’s Journal of Software Tools. 2000. Available online: https://opencv.org/ (accessed on 5 April 2021).
- Gulamhussene, G.; Joeres, F.; Rak, M.; Pech, M.; Hansen, C. 4D MRI: Robust sorting of free breathing MRI slices for use in interventional settings. PLoS ONE 2020, 15, e0235175. [Google Scholar] [CrossRef] [PubMed]
- Kingma, D.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Zeiler, M.D. ADADELTA: An Adaptive Learning Rate Method. arXiv 2012, arXiv:1212.5701v1. [Google Scholar]
- Sutskever, I.; Martens, J.; Dahl, G.; Hinton, G. On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; Volume 28, pp. 1139–1147. [Google Scholar]
- Tieleman, T.; Hinton, G. Lecture 6.5-RMSProp. COURSERA Neural Netw. Mach. Learn. 2012, 4, 26–30. [Google Scholar]
- Prasad, P.J.R.; Survarachakan, S.; Khan, Z.A.; Lindseth, F.; Elle, O.J.; Albregtsen, F.; Kumar, R.P. Numerical Evaluation on Parametric Choices Influencing Segmentation Results in Radiology Images—A Multi-Dataset Study. Electronics 2021, 10, 431. [Google Scholar] [CrossRef]
- Zhao, W.; Du, S. Spectral–Spatial Feature Extraction for Hyperspectral Image Classification: A Dimension Reduction and Deep Learning Approach. IEEE Trans. Geosci. Remote Sens. 2016, 54, 4544–4554. [Google Scholar] [CrossRef]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tomasi, S.J. Good features to track. In Proceedings of the 1994 IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 21–23 June 1994; pp. 593–600. [Google Scholar] [CrossRef]
- Tao, Y.; Papadias, D.; Shen, Q. Chapter 26—Continuous Nearest Neighbor Search. In Proceedings of the VLDB ’02: Proceedings of the 28th International Conference on Very Large Databases, Kong, China, 20–23 August 2002; Bernstein, P.A., Ioannidis, Y.E., Ramakrishnan, R., Papadias, D., Eds.; Morgan Kaufmann: San Francisco, CA, USA, 2002; pp. 287–298. [Google Scholar] [CrossRef] [Green Version]
- Perez, S.H.; Marinho, M.M.; Harada, K. The effects of different levels of realism on the training of CNNs with only synthetic images for the semantic segmentation of robotic instruments in a head phantom. Int. J. Comput. Assist. Radiol. Surg. 2020, 15, 1257–1265. [Google Scholar] [CrossRef]
Hyperparameters | Values |
---|---|
Optimizers | Adam/Adadelta/RMSprop/SGD |
Batch Size | 32/128/256 |
Learning Rate | 0.1/0.01/0.001 |
Dropout | 0.5/0.8 |
Exp 1.1 Synthetic Images Test | Exp 1.2 Real Images Test | |||||
---|---|---|---|---|---|---|
Model | F-Score | Acc | AUC | F-Score | Acc | AUC |
Optimizer | (%) | (%) | (%) | (%) | (%) | (%) |
AlexNet | <55 | <55 | <55 | <55 | <55 | <55 |
for All | Lr: for all, Bs: for all | Lr: for all, Bs: for all | ||||
SqueezeNet | 71 | 70 | 70 | 76 | 73 | 72 |
AdaDelta | Lr: 0.1, Bs:256 | Lr: 0.001, Bs:128 | ||||
SqueezeNet | 80 | 74 | 74 | 63 | 58 | 57 |
Adam | Lr: 0.01, Bs:32 | Lr: 0.1, Bs:128 | ||||
VGG16 | 76 | 70 | 70 | 69 | 64 | 63 |
SGD | Lr: 0.01, Bs:32 | Lr: 0.001, Bs:32 |
Exp 2.1 Synthetic Images Test | Exp 2.2 Real Images Test | |||||
---|---|---|---|---|---|---|
Model | F-Score | Acc | AUC | F-Score | Acc | AUC |
Optimizer | (%) | (%) | (%) | (%) | (%) | (%) |
CDNTS + AlexNet | 90 | 90 | 89.9 | 77 | 75 | 75 |
SGD | Lr: 0.1, Bs: 256 | Lr: 0.001, Bs:256 | ||||
CDNTS + SqueezeNet | 84 | 84 | 83.7 | 91 | 89 | 87.8 |
Adam | Lr: 0.01, Bs:256 | Lr: 0.01, Bs:32 | ||||
CDNTS + SqueezeNet | 79 | 76 | 75.6 | 91 | 90 | 88.9 |
RMSProp | Lr: 0.001, Bs:128 | Lr: 0.01, Bs:256 | ||||
CDNTS + VGG16 | 88 | 88 | 87.4 | 62 | 58 | 57.7 |
SGD | Lr: 0.001, Bs:32 | Lr: 0.001, Bs:32 |
Process Name | Process Time (ms)/Each Image |
---|---|
Train dataset | 36.7 milliseconds |
Synthetic test dataset | 63 milliseconds |
Real test Dataset | 75.9 milliseconds |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Öztürk, A.E.; Erçelebi, E. Real UAV-Bird Image Classification Using CNN with a Synthetic Dataset. Appl. Sci. 2021, 11, 3863. https://doi.org/10.3390/app11093863
Öztürk AE, Erçelebi E. Real UAV-Bird Image Classification Using CNN with a Synthetic Dataset. Applied Sciences. 2021; 11(9):3863. https://doi.org/10.3390/app11093863
Chicago/Turabian StyleÖztürk, Ali Emre, and Ergun Erçelebi. 2021. "Real UAV-Bird Image Classification Using CNN with a Synthetic Dataset" Applied Sciences 11, no. 9: 3863. https://doi.org/10.3390/app11093863
APA StyleÖztürk, A. E., & Erçelebi, E. (2021). Real UAV-Bird Image Classification Using CNN with a Synthetic Dataset. Applied Sciences, 11(9), 3863. https://doi.org/10.3390/app11093863