Using Generative Module and Pruning Inference for the Fast and Accurate Detection of Apple Flower in Natural Environments
<p>Image dataset was collected in Taolin Village in three scales.</p> "> Figure 2
<p>Dataset overall. (<b>A</b>) is the distribution of apple flowers’ number in each image; (<b>B</b>) is the distribution of different scales of apple flowers in each image; (<b>C</b>) shows four samples with different numbers of apple flowers.</p> "> Figure 3
<p>Simple data augmentation.</p> "> Figure 4
<p>Illustration of five augmentation methods. (<b>A</b>) Mixup; (<b>B</b>) Cutout; (<b>C</b>) CutMix; (<b>D</b>) SnapMix; (<b>E</b>) Mosaic.</p> "> Figure 5
<p>Illustration of generative module on YOLO-v5 structure.</p> "> Figure 6
<p>Illustration of three implementations of generative module.</p> "> Figure 7
<p>Illustration of pruning process of feature pyramid network and UNet.</p> "> Figure 8
<p>The learning rate of two warm-up schemes.</p> "> Figure 9
<p>Training curves of accuracy and loss against number of iterations on YOLO series.</p> "> Figure 10
<p>Training curves of accuracy and loss against number of iterations on SSD series.</p> "> Figure 11
<p>Training curves of accuracy and loss against number of iterations on EfficientDet series, part I.</p> "> Figure 12
<p>Training curves of accuracy and loss against number of iterations on EfficientDet series, part II.</p> "> Figure 13
<p>Demonstration of GM-EfficientDet-D5’s effectiveness. (<b>A</b>) is large scale; (<b>B</b>) is medium scale; (<b>C</b>) is large scale.</p> ">
Abstract
:1. Introduction
2. Materials and Methods
2.1. Dataset Analysis
2.2. Data Augmentation
2.2.1. Simple Augmentation
2.2.2. Advanced Augmentation
- Mixup [20] is designed to solve the problem of colossal memory loss and the unsatisfactory sensitivity of the network to adversarial samples, as shown in Figure 4A. Since the model we used includes the generative module, enhancing the sensitivity of adversarial samples can improve the accuracy of the generative module, thus improving the regularization of final generated images. Mixup will encourage the model to have a linear understanding, which means that judgments on a sample are not so absolute, thus reducing overfitting.
- Cutout [21] randomly cuts out part of the sample and fills it with a particular pixel, and the classification result remains unchanged. The Cutout is done by masking the image with a fixed size rectangle, and all values are set to 0 or other solid color values within the rectangle, as shown in Figure 4B. Cutout enables the convolutional neural network to use the global information of the whole image instead of the local information of some minor features.
- CutMix [22]. The CutMix method is to cut off part of the region. Instead of filling 0 pixels, the region pixel values of other data in the training set are stochastically filled, as shown in Figure 4C. CutMix enables the model to identify two targets from a local view of an image, improving the training efficiency. Moreover, it enables the model to focus on the areas where the target is difficult to distinguish. However, there is no information in some areas, which will affect the training efficiency.
- Mosaic [24] can utilize multiple images at once. The most crucial advantage of Mosaic is that it can enrich the background of the detected objects. The data of multiple images will be counted in the BatchNorm calculation, which can effectively improve the model’s generalization. In this paper, we used multiple images containing apple flowers between 5 and 10 to generate a single image containing at least 20 apple flowers by Mosaic, as shown in Figure 4E. In this way, we improve the recognition performance of the model for high-density images.
2.3. Generative Module
- VAE [26] is suitable for generating unseen data but cannot control the generated content. CVAE (Conditional VAE) [27] can generate the data you want by specifying its label during data generation. Therefore, CVAE can be used as an implementation of generative module. When generating data, we first sampled the data from a normal distribution, spliced in the label of the generated data, and passed the spliced vector into the decoder; thus, we can generate the data corresponding to the label as shown in Figure 6A.
- GAN [28] generator can only generate images based on random noise, and it has no control over the specifics of which labeled image is generated. Moreover, the discriminator can only receive the image input to discriminate whether the image is from the generator. The CGAN [29] adds additional information to the generator’s input and discriminator of the GAN. If this additional information is the image’s label, the generator can be controlled to generate the image with a specific label, as shown in Figure 6B. Therefore, CGAN can be used as an implementation of generative module.
- CVAE-GAN. The network structure of CVAE-GAN is shown in Figure 6C, which combines the features of CVAE and CGAN. Although it helps to improve the quality of generated images, the units make the network more complex and may reduce the network speed during inference.
2.4. Pruning Inference
2.5. Loss Function
2.6. Warm-Up
2.7. Evaluation Metrics
3. Results
3.1. Experiment
3.1.1. Equipment
3.1.2. Baseline Experiment
3.2. Results and Analysis
4. Discussion
4.1. Ablation Experiment about Generative Module
4.2. Ablation Experiment about Pruning Inference
4.3. Module Analysis
- Branches added for the overfitting phenomenon of complex network structures. As the neural network is compounded after the improvement hybridization, the overfitting ability of the network will increase. In order to reduce the possibility of overfitting, the model incorporates the generative module. Through this module, a result from the confrontation is obtained. Since features of the highest dimension are extracted and simulated, this result is combined with the upper part of the determination model. The results generated are computed together with the loss calculation, thus improving the robustness of the whole detection network.
- Add the pruning inference to the network. In general, the higher the detection network performance, the better. However, the performance improvement is often accompanied by a considerable time cost. Moreover, the depth of the neural network deeper does not necessarily lead to better results. Owing to overfitting, the partitioning results of deeper networks may even be inferior to those of lower layers. Therefore, the pruning of the model is judged at the time of training in terms of the given conditions. We also zero the input of the generative module branch to achieve structural deactivation, which significantly improves the training speed and even reduces the overfitting of the neural network to achieve the “one model polymorphism”.
4.4. Smart Apple Flower Detection System
5. Conlusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Wu, D.; Lv, S.; Jiang, M.; Song, H. Using channel pruning-based YOLO v4 deep learning algorithm for the real-time and accurate detection of apple flowers in natural environments. Comput. Electron. Agric. 2020, 178, 105742. [Google Scholar] [CrossRef]
- Dias, P.A.; Tabb, A.; Medeiros, H. Apple flower detection using deep convolutional networks. Comput. Ind. 2018, 99, 17–28. [Google Scholar] [CrossRef] [Green Version]
- Pathan, M.; Patel, N.; Yagnik, H.; Shah, M. Artificial cognition for applications in smart agriculture: A comprehensive review. Artif. Intell. Agric. 2020, 4, 81–95. [Google Scholar] [CrossRef]
- Weng, S.; Zhu, W.; Zhang, X.; Yuan, H.; Zheng, L.; Zhao, J.; Huang, L.; Han, P. Recent advances in Raman technology with applications in agriculture, food and biosystems: A review. Artif. Intell. Agric. 2019, 3, 1–10. [Google Scholar] [CrossRef]
- Zhang, W.; Hu, J.; Zhou, G.; He, M. Detection of Apple Defects Based on the FCM-NPGA and a Multivariate Image Analysis. IEEE Access 2020, 8, 38833–38845. [Google Scholar] [CrossRef]
- Guo, Z.; Wang, M.; Agyekum, A.A.; Wu, J.; Chen, Q.; Zuo, M.; El-Seedi, H.R.; Tao, F.; Shi, J.; Ouyang, Q.; et al. Quantitative detection of apple watercore and soluble solids content by near infrared transmittance spectroscopy. J. Food Eng. 2020, 279, 109955. [Google Scholar] [CrossRef]
- Jiang, B.; He, J.; Yang, S.; Fu, H.; Li, T.; Song, H.; He, D. Fusion of machine vision technology and AlexNet-CNNs deep learning network for the detection of postharvest apple pesticide residues. Artif. Intell. Agric. 2019, 1, 1–8. [Google Scholar] [CrossRef]
- Abbas, H.M.T.; Shakoor, U.; Khan, M.J.; Ahmed, M.; Khurshid, K. Automated Sorting and Grading of Agricultural Products based on Image Processing. In Proceedings of the 2019 8th International Conference on Information and Communication Technologies (ICICT), Karachi, Pakistan, 16–17 November 2019; pp. 78–81. [Google Scholar] [CrossRef]
- Sun, S.; Jiang, M.; He, D.; Long, Y.; Song, H. Recognition of green apples in an orchard environment by combining the GrabCut model and Ncut algorithm. Biosyst. Eng. 2019, 187, 201–213. [Google Scholar] [CrossRef]
- Mazzia, V.; Khaliq, A.; Salvetti, F.; Chiaberge, M. Real-Time Apple Detection System Using Embedded Systems With Hardware Accelerators: An Edge AI Application. IEEE Access 2020, 8, 9102–9114. [Google Scholar] [CrossRef]
- Islam, N.; Rashid, M.M.; Wibowo, S.; Xu, C.Y.; Morshed, A.; Wasimi, S.A.; Moore, S.; Rahman, S.M. Early Weed Detection Using Image Processing and Machine Learning Techniques in an Australian Chilli Farm. Agriculture 2021, 11, 387. [Google Scholar] [CrossRef]
- Xu, J.; Guga, S.; Rong, G.; Riao, D.; Liu, X.; Li, K.; Zhang, J. Estimation of Frost Hazard for Tea Tree in Zhejiang Province Based on Machine Learning. Agriculture 2021, 11, 607. [Google Scholar] [CrossRef]
- Lim, J.; Ahn, H.S.; Nejati, M.; Bell, J.; Williams, H.; MacDonald, B.A. Deep Neural Network Based Real-time Kiwi Fruit Flower Detection in an Orchard Environment. arXiv 2020, arXiv:2006.04343. [Google Scholar]
- Afonso, M.; Mencarelli, A.; Polder, G.; Wehrens, R.; Lensink, D.; Faber, N. Detection of Tomato Flowers from Greenhouse Images Using Colorspace Transformations. In Progress in Artificial Intelligence; Moura Oliveira, P., Novais, P., Reis, L.P., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 146–155. [Google Scholar]
- Tian, M.; Chen, H.; Wang, Q. Detection and Recognition of Flower Image Based on SSD network in Video Stream. J. Phys. Conf. Ser. 2019, 1237, 032045. [Google Scholar] [CrossRef]
- Biradar, B.V.; Shrikhande, S.P. Flower detection and counting using morphological and segmentation technique. Int. J. Comput. Sci. Inform. Technol 2015, 6, 2498–2501. [Google Scholar]
- Sun, K.; Wang, X.; Liu, S.; Liu, C. Apple, peach, and pear flower detection using semantic segmentation network and shape constraint level set. Comput. Electron. Agric. 2021, 185, 106150. [Google Scholar] [CrossRef]
- Farjon, G.; Krikeb, O.; Hillel, A.B.; Alchanatis, V. Detection and counting of flowers on apple trees for better chemical thinning decisions. Precis. Agric. 2020, 21, 503–521. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
- DeVries, T.; Taylor, G.W. Improved regularization of convolutional neural networks with cutout. arXiv 2017, arXiv:1708.04552. [Google Scholar]
- Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 6023–6032. [Google Scholar]
- Huang, S.; Wang, X.; Tao, D. SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data. arXiv 2020, arXiv:2012.04846. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Du, L.; Ding, X.; Liu, T.; Li, Z. Modeling event background for if-then commonsense reasoning using context-aware variational autoencoder. arXiv 2019, arXiv:1909.08824. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Adv. Neural Inf. Process. Syst. 2014, 3, 2672–2680. [Google Scholar] [CrossRef]
- Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Jocher, G. yolov5. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 26 October 2020).
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
- Li, Z.; Zhou, F. FSSD: Feature fusion single shot multibox detector. arXiv 2017, arXiv:1712.00960. [Google Scholar]
- Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Single-shot refinement neural network for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4203–4212. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Large | Medium | Small | Total | |
---|---|---|---|---|
Original dataset | 1044 | 1022 | 46 | 2158 |
After data augmentation | 15,660 | 15,330 | 6900 | 37,890 |
Training set | 14,094 | 13,797 | 6210 | 34,101 |
Validation set | 1566 | 1533 | 690 | 3789 |
Model Parameters | Values |
---|---|
Initial learning rate | 0.02 |
Image input batch size | 2 |
Gamma | 0.1 |
Maximum iterations | 200,000 |
Model | Precision | Recall | mAP | FPS |
---|---|---|---|---|
YOLO-v3 | 84.77 | 94.19 | 90.97 | 39 |
YOLO-v4 | 85.12 | 89.27 | 89.13 | 36 |
YOLO-v5 | 87.13 | 92.75 | 91.82 | 42 |
SSD | 71.03 | 82.49 | 80.34 | 17 |
FSSD | 81.61 | 93.37 | 91.47 | 21 |
RefineDet | 84.95 | 93.39 | 91.77 | 23 |
EfficientDet-D2 | 84.57 | 88.19 | 86.39 | 47 |
EfficientDet-D3 | 86.22 | 89.81 | 87.22 | 41 |
EfficientDet-D4 | 85.71 | 91.49 | 88.69 | 42 |
EfficientDet-D5 | 86.03 | 91.33 | 89.91 | 35 |
EfficientDet-D6 | 85.71 | 91.49 | 83.47 | 33 |
EfficientDet-D7 | 85.24 | 90.98 | 84.14 | 29 |
Model | Precision | Recall | mAP | FPS |
---|---|---|---|---|
Faster RCNN | 79.87 | 87.93 | 84.18 | 37 |
Mask RCNN | 81.99 | 91.03 | 87.26 | 39 |
GM-Mask RCNN | 85.39 | 95.60 | 94.91 | 33 |
YOLO-v5 | 87.13 | 92.75 | 91.82 | 42 |
GM-YOLO-v5 | 89.77 | 96.48 | 93.90 | 38 |
RefineDet | 84.95 | 93.39 | 91.77 | 23 |
GM-RefineDet | 87.41 | 97.11 | 93.38 | 17 |
EfficientDet-D5 | 86.03 | 91.33 | 89.91 | 35 |
GM-EfficientDet-D5 | 90.01 | 98.79 | 97.43 | 29 |
Object Size | Small | Medium | Large |
---|---|---|---|
YOLO-v5 (P) | 67.11 | 87.01 | 87.29 |
YOLO-v5 (R) | 71.98 | 91.99 | 92.94 |
YOLO-v5 (mAP) | 63.87 | 91.82 | 91.83 |
GM-EfficientDet-D5 (P) | 78.18 | 89.93 | 90.25 |
GM-EfficientDet-D5 (R) | 85.21 | 98.83 | 98.79 |
GM-EfficientDet-D5 (mAP) | 83.94 | 97.42 | 97.45 |
Model | GM | Precision | Recall | mAP | FPS |
---|---|---|---|---|---|
CGAN | 90.01 | 98.79 | 97.43 | 29 | |
GM-EfficientDet-D5 | CVAE | 89.17 | 96.33 | 97.41 | 30 |
CVAE-GAN | 90.03 | 98.50 | 97.61 | 25 | |
CGAN | 85.28 | 89.20 | 88.47 | 71 | |
GM-YOLO-v5-PI | CVAE | 84.71 | 89.31 | 89.02 | 76 |
CVAE-GAN | 91.27 | 94.12 | 93.18 | 47 |
Model | Strategy | Precision | Recall | mAP | FPS |
---|---|---|---|---|---|
GM-EfficientDet-D5 | baseline | 90.01 | 98.79 | 97.43 | 29 |
PI | 89.13 | 98.10 | 96.18 | 51 | |
EfficientDet-D5 | baseline | 86.03 | 91.33 | 89.91 | 35 |
PI | 85.91 | 89.18 | 88.33 | 53 | |
GM-YOLO-v5 | baseline | 89.77 | 96.48 | 93.90 | 38 |
PI | 89.14 | 96.27 | 93.15 | 63 | |
YOLO-v5 | baseline | 87.13 | 92.75 | 91.82 | 42 |
PI | 85.28 | 89.20 | 88.47 | 71 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Y.; He, S.; Wa, S.; Zong, Z.; Liu, Y. Using Generative Module and Pruning Inference for the Fast and Accurate Detection of Apple Flower in Natural Environments. Information 2021, 12, 495. https://doi.org/10.3390/info12120495
Zhang Y, He S, Wa S, Zong Z, Liu Y. Using Generative Module and Pruning Inference for the Fast and Accurate Detection of Apple Flower in Natural Environments. Information. 2021; 12(12):495. https://doi.org/10.3390/info12120495
Chicago/Turabian StyleZhang, Yan, Shupeng He, Shiyun Wa, Zhiqi Zong, and Yunling Liu. 2021. "Using Generative Module and Pruning Inference for the Fast and Accurate Detection of Apple Flower in Natural Environments" Information 12, no. 12: 495. https://doi.org/10.3390/info12120495
APA StyleZhang, Y., He, S., Wa, S., Zong, Z., & Liu, Y. (2021). Using Generative Module and Pruning Inference for the Fast and Accurate Detection of Apple Flower in Natural Environments. Information, 12(12), 495. https://doi.org/10.3390/info12120495