Lightweight SM-YOLOv5 Tomato Fruit Detection Algorithm for Plant Factory
<p>Attribute visualization results of the dataset in this study: (<b>a</b>) the number of dataset labels, (<b>b</b>) the label ratio of the dataset, (<b>c</b>) the label location of the dataset, (<b>d</b>) the label size of the data.</p> "> Figure 2
<p>Diagram illustrating dataset annotation using LabelImg.</p> "> Figure 3
<p>The integrated architecture of SM-YOLOv5 includes a backbone network (in blue) that was replaced with MobileNetv3-Large. The small-target detection layer added based on the original three-layer target detection model is represented by the red box. The FPN and PAN structures (in yellow and cyan boxes, respectively) were supplemented with a small object detection layer to enhance the detection of small targets.</p> "> Figure 4
<p>Flowchart illustrating the training and detection process of SM-YOLO, with the training phase represented by orange boxes and the detection phase represented by green boxes.</p> "> Figure 5
<p>Schematic diagram of separable convolution.</p> "> Figure 6
<p>Comparison of multi-layer detection results. Detection results for (<b>a</b>) large targets, (<b>b</b>) medium targets, (<b>c</b>) small targets, and (<b>d</b>) multi-layer target fusion detection. Borders and text background colors indicate that the recognized classification was “green” or “red” fruit. White circle callouts indicate tomato fruits that were not correctly identified.</p> "> Figure 7
<p>Training results of different models.</p> "> Figure 8
<p>Visualization of the results from ablation experiments conducted using YOLOv5, S-YOLOv5, M-YOLOv5, and SM-YOLOv5 methods.</p> ">
Abstract
:1. Introduction
- The CSPDarknet53 backbone network was replaced by the MobileNetV3-Large lightweight network in this study. The lightweight network employed squeeze-and-excitation models and attention mechanisms to efficiently extract features in the channel-wise dimension. This replacement resulted in a reduction in the model size and a decrease in the computational demands.
- To enhance the accuracy of the lightweight model in detecting small-sized tomato fruits in images, a small object detection layer was introduced into the network architecture. This additional layer was capable of extracting more features to improve the accuracy of the detection for overlapping or small objects and obscured tomato fruits.
- These enhancements are of high importance in plant factories, where the accurate detection of small objects is crucial for effective and precise plant monitoring and management. The lightweight network can also support embedded picking robots when detecting tomato fruits, further highlighting its practical application potential.
2. Materials and Methods
2.1. Data Acquisition and Preprocessing
2.1.1. Image Acquisition
2.1.2. Dataset Annotation and Augmentation
2.2. Experimental Environment
2.3. Model Evaluation Metrics
3. Proposed SM-YOLOv5 Model
3.1. Lightweight MobileNetV3-Large Backbone Network
3.2. Small-Target Detection Layer
3.3. Trained Anchors and Transfer Learning
4. Results and Analyses
4.1. SM-YOLOV5 Training and Validation
4.2. SM-YOLOV5 Model Testing
4.3. Performance Comparison
4.4. Ablation Experiment
5. Discussion
6. Conclusions
- Lightweight: The proposed model backbone was replaced with the MobileNetV3-Large network, which is a lightweight architecture that reduced the model’s FLOPs to 7.6 GFLOPs and its size to 6.3 MB.
- Small-target detection: The additional detection layer resulted in the improved performance of the proposed algorithm in detecting tomatoes that were obscured, overlapping, or small in size.
- Accuracy: The proposed model was modified to reduce its scale by replacing the backbone with a lightweight alternative. To ensure accurate detection while maintaining the model’s lightweight characteristic, a small detection layer was integrated into its architecture. This operational enhancement resulted in a significant improvement in accuracy, with the test set achieving a score of 98.8%.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
CHT | Circular Hough transform |
CNNs | Convolutional neural networks |
COCO | Common Objects in Context |
CUDA | Computer unified device architecture |
DCNNs | Deep convolutional neural networks |
DW-Conv | Depthwise convolution |
DWS-Conv | Depthwise separable convolution |
FLOPs | Floating-point operations |
FNCC | Fast normalized cross-correlation function |
GT | Ground truth |
HOG | Histogram of oriented gradient |
IoU | Intersection over union |
M-YOLOv5 | Mobilenet-YOLOv5 |
NMS | Non-maximum suppression |
PF | Plant factory |
PW-Conv | Pointwise convolution |
RCNN | Regional convolutional neural network |
SIFT | Scale-invariant feature transform |
S-YOLOv5 | Small-YOLOv5 |
SM-YOLOv5 | Small-Mobilenet-YOLOv5 |
SSD | Single-shot MultiBox detector |
SVM | Support vector machine |
VOC | Visual object classes |
XML | Extensible markup language |
YOLO | You Only Look Once |
References
- Xi, L.; Zhang, M.; Zhang, L.; Lew, T.T.S.; Lam, Y.M. Novel Materials for Urban Farming. Adv. Mater. 2022, 34, 2105009. [Google Scholar] [CrossRef] [PubMed]
- Ares, G.; Ha, B.; Jaeger, S.R. Consumer Attitudes to Vertical Farming (Indoor Plant Factory with Artificial Lighting) in China, Singapore, UK, and USA: A Multi-Method Study. Food Res. Int. 2021, 150, 110811. [Google Scholar] [CrossRef] [PubMed]
- Food and Agriculture Organisation. Food and Agriculture Organisation of the United Nations (FAOSTAT). Available online: https://www.fao.org/faostat/en/#data/QCL/ (accessed on 4 January 2023).
- Lindeberg, T. Scale Invariant Feature Transform. Scholarpedia 2012, 7, 10491. [Google Scholar] [CrossRef]
- Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
- Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support Vector Machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef] [Green Version]
- Uijlings, J.R.; Van De Sande, K.E.; Gevers, T.; Smeulders, A.W. Selective Search for Object Recognition. Int. J. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef] [Green Version]
- Iwasaki, F.; Imamura, H. A Robust Recognition Method for Occlusion of Mini Tomatoes Based on Hue Information and Shape of Edge. In Proceedings of the International Conference on Computer Graphics, Multimedia and Image Processing, Kuala Lumpur, Malaysia, 17–19 November 2014; pp. 516–521. [Google Scholar] [CrossRef]
- Linker, R.; Cohen, O.; Naor, A. Determination of the Number of Green Apples in RGB Images Recorded in Orchards. Comput. Electron. Agric. 2012, 81, 45–57. [Google Scholar] [CrossRef]
- Wei, X.; Jia, K.; Lan, J.; Li, Y.; Zeng, Y.; Wang, C. Automatic Method of Fruit Object Extraction under Complex Agricultural Background for Vision System of Fruit Picking Robot. Optik 2014, 125, 5684–5689. [Google Scholar] [CrossRef]
- Wu, J.S.; Zhang, B.; Gao, Y.L. An Effective Flame Segmentation Method Based on Ohta Color Space. Adv. Mater. Res. 2012, 485, 7–11. [Google Scholar] [CrossRef]
- Li, H.; Zhang, M.; Gao, Y.; Li, M.; Ji, Y. Green Ripe Tomato Detection Method Based on Machine Vision in Greenhouse. Trans. Chin. Soc. Agric. Eng. 2017, 33, 328–334. [Google Scholar]
- Fu, L.; Tola, E.; Al-Mallahi, A.; Li, R.; Cui, Y. A Novel Image Processing Algorithm to Separate Linearly Clustered Kiwifruits. Biosyst. Eng. 2019, 183, 184–195. [Google Scholar] [CrossRef]
- Girshick, R.B.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef] [Green Version]
- Girshick, R. Fast R-Cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-Cnn: Towards Real-Time Object Detection with Region Proposal Networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1–9. [Google Scholar] [CrossRef] [Green Version]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016; Lecture Notes in Computer Science; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Farhadi, A. Yolov3: An Incremental Improvement. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding Yolo Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
- Mnih, V.; Heess, N.; Graves, A.; Kavukcuoglu, K. Recurrent Models of Visual Attention. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef] [Green Version]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the 15th European Conference Computer Vision (ECCV 2018), Munich, Germany, 8–14 September 2018; Volume 11211, pp. 3–19. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Zhang, C.; Kang, F.; Wang, Y. An Improved Apple Object Detection Method Based on Lightweight YOLOv4 in Complex Backgrounds. Remote Sens. 2022, 14, 4150. [Google Scholar] [CrossRef]
- Xu, Z.; Huang, X.; Huang, Y.; Sun, H.; Wan, F. A Real-Time Zanthoxylum Target Detection Method for an Intelligent Picking Robot under a Complex Background, Based on an Improved YOLOv5s Architecture. Sensors 2022, 22, 682. [Google Scholar] [CrossRef]
- Tian, Y.; Yang, G.; Wang, Z.; Wang, H.; Li, E.; Liang, Z. Apple Detection during Different Growth Stages in Orchards Using the Improved YOLO-V3 Model. Comput. Electron. Agric. 2019, 157, 417–426. [Google Scholar] [CrossRef]
- Su, F.; Zhao, Y.; Wang, G.; Liu, P.; Yan, Y.; Zu, L. Tomato Maturity Classification Based on SE-YOLOv3-MobileNetV1 Network under Nature Greenhouse Environment. Agronomy 2022, 12, 1638. [Google Scholar] [CrossRef]
- Wang, X.; Vladislav, Z.; Viktor, O.; Wu, Z.; Zhao, M. Online Recognition and Yield Estimation of Tomato in Plant Factory Based on YOLOv3. Sci. Rep. 2022, 12, 8686. [Google Scholar] [CrossRef]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (Voc) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Taha, A.A.; Hanbury, A. Metrics for Evaluating 3D Medical Image Segmentation: Analysis, Selection, and Tool. BMC Med. Imaging 2015, 15, 29. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, W.; Feng, X.S.; Zha, K.; Li, S.; Zhu, H.S. Summary of Target Detection Algorithms. In Proceedings of the Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2021; Volume 1757, p. 012003. [Google Scholar] [CrossRef]
- Neubeck, A.; Van Gool, L. Efficient Non-Maximum Suppression. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; Volume 3, pp. 850–855. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft Coco: Common Objects in Context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 740–755. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V. Searching for Mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 1314–1324. [Google Scholar] [CrossRef]
- Ju, M.; Luo, H.; Wang, Z.; Hui, B.; Chang, Z. The Application of Improved YOLO V3 in Multi-Scale Target Detection. Appl. Sci. 2019, 9, 3775. [Google Scholar] [CrossRef] [Green Version]
- Zhong, Y.; Wang, J.; Peng, J.; Zhang, L. Anchor Box Optimization for Object Detection. In Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA, 1–5 March 2020; IEEE: Snowmass Village, CO, USA, 2020; pp. 1275–1283. [Google Scholar] [CrossRef]
- Qiu, X.; Sun, T.; Xu, Y.; Shao, Y.; Dai, N.; Huang, X. Pre-Trained Models for Natural Language Processing: A Survey. Sci. China Technol. Sci. 2020, 63, 1872–1897. [Google Scholar] [CrossRef]
Set | Number of Images | Number of Green Tomato Samples | Number of Red Tomato Samples |
---|---|---|---|
Training | 462 | 4183 | 4104 |
Validation | 132 | 1240 | 1160 |
Testing | 66 | 607 | 522 |
Total | 660 | 6030 | 5786 |
Layer | Input Size | Kernel Size | Expand | #out 1 | SE 2 | NL 3 | s 6 | Detection Layer |
---|---|---|---|---|---|---|---|---|
1 | 320, 320, 8 | 3 × 3 | 16 | 16 | RE 4 | 1 | ||
2 | 320, 320, 8 | 3 × 3 | 64 | 24 | RE | 2 | ||
3 | 160, 160, 16 | 3 × 3 | 72 | 24 | RE | 1 | Detection1 5 | |
4 | 160, 160, 16 | 5 × 5 | 72 | 40 | ✓ 7 | RE | 2 | |
5 | 80, 80, 24 | 5 × 5 | 120 | 40 | ✓ | RE | 1 | |
6 | 80, 80, 24 | 5 × 5 | 120 | 40 | ✓ | RE | 1 | Detection2 5 |
7 | 80, 80, 24 | 3 × 3 | 240 | 80 | HS 4 | 2 | ||
8 | 40, 40, 40 | 3 × 3 | 200 | 80 | HS | 1 | ||
9 | 40, 40, 40 | 3 × 3 | 184 | 80 | HS | 1 | ||
10 | 40, 40, 40 | 3 × 3 | 184 | 80 | HS | 1 | ||
11 | 40, 40, 40 | 3 × 3 | 480 | 112 | ✓ | HS | 1 | |
12 | 40, 40, 56 | 3 × 3 | 672 | 112 | ✓ | HS | 1 | |
13 | 40, 40, 56 | 5 × 5 | 672 | 160 | ✓ | HS | 2 | Detection3 5 |
14 | 40, 40, 80 | 5 × 5 | 960 | 160 | ✓ | HS | 1 | |
15 | 20, 20, 40 | 5 × 5 | 960 | 160 | ✓ | HS | 1 | Detection4 5 |
Downsampling | COCO Anchor | Tomato Anchor | Our Anchor |
---|---|---|---|
2× | |||
4× | 18 × 17, 22 × 23, 26 × 25 | ||
8× | 10 × 13, 16 × 30, 33 × 23 | 19 × 19, 30 × 29, 40 × 38 | 34 × 34, 52 × 52, 45 × 44 |
16× | 30 × 61, 62 × 45, 59 × 119 | 51 × 49, 62 × 63, 76 × 73 | 57 × 53, 65 × 66, 81 × 76 |
32× | 116 × 90, 156 × 198, 373 × 326 | 89 × 85, 108 × 106, 132 × 130 | 93 × 91, 110 × 108, 134 × 133 |
Tomato Fruit Color | Precision (%) | Recall (%) | AP (%) | mAP (%) |
---|---|---|---|---|
Green | 98.0 | 96.2 | 98.6 | 98.8 |
Red | 98.5 | 96.8 | 99.0 |
Network | Backbone | Number of Detection Layers | mAP (%) | GFLOPs | Weight Size (MB) |
---|---|---|---|---|---|
YOLOv5 | CSPDarknet53 | 3 | 97.4 | 15.8 | 14.9 |
SSD | VGG16 | 6 | 90.7 | 30.5 | 182.0 |
YOLOv3 | Darknet53 | 3 | 97.5 | 154.9 | 470.2 |
Faster RCNN | VGG16 | Regional proposal | 81.2 | 63.9 | 522.0 |
SM-YOLOv5 | MobileNetV3-Large | 4 | 98.8 | 7.6 | 6.3 |
Network | Small-Target Detection Layer | Improved Backbone | Detection Layer | Precision (%) | Recall (%) | mAP (%) | GFLOPs | Weight Size (MB) |
---|---|---|---|---|---|---|---|---|
YOLOv5 | 3 | 97.9 | 94.8 | 97.4 | 15.8 | 14.9 | ||
S-YOLOv5 | ✓ 1 | 3 + 1 | 98.0 | 96.2 | 98.0 | 23.4 | 14.9 | |
M-YOLOv5 | ✓ | 3 | 97.9 | 95.4 | 98.3 | 4.7 | 6.33 | |
SM-YOLOv5 | ✓ | ✓ | 3 + 1 | 97.8 | 96.7 | 98.8 | 7.6 | 6.33 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, X.; Wu, Z.; Jia, M.; Xu, T.; Pan, C.; Qi, X.; Zhao, M. Lightweight SM-YOLOv5 Tomato Fruit Detection Algorithm for Plant Factory. Sensors 2023, 23, 3336. https://doi.org/10.3390/s23063336
Wang X, Wu Z, Jia M, Xu T, Pan C, Qi X, Zhao M. Lightweight SM-YOLOv5 Tomato Fruit Detection Algorithm for Plant Factory. Sensors. 2023; 23(6):3336. https://doi.org/10.3390/s23063336
Chicago/Turabian StyleWang, Xinfa, Zhenwei Wu, Meng Jia, Tao Xu, Canlin Pan, Xuebin Qi, and Mingfu Zhao. 2023. "Lightweight SM-YOLOv5 Tomato Fruit Detection Algorithm for Plant Factory" Sensors 23, no. 6: 3336. https://doi.org/10.3390/s23063336
APA StyleWang, X., Wu, Z., Jia, M., Xu, T., Pan, C., Qi, X., & Zhao, M. (2023). Lightweight SM-YOLOv5 Tomato Fruit Detection Algorithm for Plant Factory. Sensors, 23(6), 3336. https://doi.org/10.3390/s23063336