An Improved YOLOV5 Based on Triplet Attention and Prediction Head Optimization for Marine Organism Detection on Underwater Mobile Platforms
<p>Marine object detection: (<b>a</b>) detection by YOLOV5; (<b>b</b>) detection by the proposed model.</p> "> Figure 2
<p>The architecture of the proposed marine organism identification model.</p> "> Figure 3
<p>Structure and framework of triplet attention module.</p> "> Figure 4
<p>Target size distribution in marine organism dataset.</p> "> Figure 5
<p>Experimental comparison of enhancement performance on underwater images.</p> "> Figure 6
<p>Visualization of Grad-CAM results for each species of marine organisms; the ground truth in the raw images is marked with red boxes.</p> "> Figure 7
<p>Performance comparison of prediction head optimization on detection of tiny marine organisms.</p> "> Figure 8
<p>Loss curves of the proposed model and the other comparable models.</p> "> Figure 9
<p>Marine organism identification results of the proposed model and the other comparable models.</p> ">
Abstract
:1. Introduction
- (1)
- The performance of three state-of-the-art attention modules for underwater object detection with YOLOV5 was evaluated. Triplet attention-based YOLOV5 achieved the best accuracy since triplet attention can capture the cross-dimension interaction of the channel dimension and spatial dimension.
- (2)
- An optimization strategy for underwater small target detection was presented by developing a four-scale prediction head of YOLOV5, which ensured the performance of detection to the targets with significant variation in size.
- (3)
- The improved YOLOV5 was further tested on an embedded device (Nvidia Jetson Nano), and the inference time reached real-time performance (0.25 s). The overall processing time for one frame was 0.98 s, including 0.25 s for detection and 0.73 s for image enhancement.
- (4)
- An underwater image enhancement algorithm, relative global histogram stretching, was utilized to improve underwater image quality. Experimental results proved that image enhancement was efficient in improving the performance of underwater target detection.
2. Materials and Methods
2.1. Experimental Data
2.2. Model Backbone
2.3. Triplet Attention Module
2.4. Prediction Head Optimization
2.5. Evaluation Metrics
3. Experiments and Discussion
3.1. Performance of Image Enhancement
3.2. Evolution of Attention Modules
3.3. Performance of Prediction Head Optimization
3.4. Overall Identification Performance Evaluation
3.5. Adaptation Performance Evaluation
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- Yeh, C.-H.; Lin, C.-H.; Kang, L.-W.; Huang, C.-H.; Lin, M.-H.; Chang, C.-Y.; Wang, C.-C. Lightweight Deep Neural Network for Joint Learning of Underwater Object Detection and Color Conversion. IEEE Trans. Neural Networks Learn. Syst. 2021, 99, 1–15. [Google Scholar] [CrossRef]
- Han, M.; Lyu, Z.; Qiu, T.; Xu, M. A review on intelligence dehazing and color restoration for underwater images. IEEE Trans. Syst. Man, Cybern. Syst. 2020, 50, 1820–1832. [Google Scholar] [CrossRef]
- Schettini, R.; Corchs, S. Underwater Image Processing: State of the Art of Restoration and Image Enhancement Methods. EURASIP J. Adv. Signal Process. 2010, 2010, 746052. [Google Scholar] [CrossRef]
- Zhao, Z.-Q.; Zheng, P.; Xu, S.-T.; Wu, X. Object Detection With Deep Learning: A Review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef]
- Jiao, L.C.; Zhang, F.; Liu, F.; Yang, S.Y.; Li, L.L.; Feng, Z.X.; Qu, R. A Survey of Deep Learning-Based Object Detection. IEEE Access 2019, 7, 128837–128868. [Google Scholar] [CrossRef]
- Guo, G.; Zhang, N. A survey on deep learning based face recognition. Comput. Vis. Image Underst. 2019, 189, 102805. [Google Scholar] [CrossRef]
- Leclerc, M.; Tharmarasa, R.; Florea, M.; Boury-Brisset, A.; Kirubarajan, T.; Duclos-Hindié, N. Ship classification using deep learning techniques for maritime target tracking. In Proceedings of the 2018 21st International Conference on Information Fusion, Cambridge, UK, 10–13 July 2018; pp. 737–744. [Google Scholar]
- Py, O.; Hong, H.; Zhongzhi, S. Plankton classification with deep convolutional neural networks. In Proceedings of the 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference, Chongqing, China, 20–22 May 2016. [Google Scholar] [CrossRef]
- He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 2341–2353. [Google Scholar]
- Drews, P., Jr.; Nascimento, E.; Moraes, F.; Botelho, S.; Campos, M. Transmission estimation in underwater single images. In Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops, Sydney, NSW, 2–8 December 2013; pp. 825–830. [Google Scholar]
- Peng, Y.-T.; Cosman, P.C. Underwater Image Restoration Based on Image Blurriness and Light Absorption. IEEE Trans. Image Process. 2017, 26, 1579–1594. [Google Scholar] [CrossRef]
- Song, W.; Wang, Y.; Huang, D.; Tjondronegoro, D. A Rapid Scene Depth Estimation Model Based on Underwater Light Attenuation Prior for Underwater Image Restoration. In Advances in Multimedia Information Processing—PCM 2018; Hong, R., Cheng, W.-H., Yamasaki, T., Wang, M., Ngo, C.-W., Eds.; Springer International Publishing: Cham, Switzerland, 2018; Volume 11164, pp. 678–688. [Google Scholar] [CrossRef]
- Huang, D.; Wang, Y.; Song, W.; Sequeira, J.; Mavromatis, S. Shallow-water image enhancement using relative global histogram stretching based on adaptive parameter acquisition. In Proceedings of the International Conference on Multimedia Modeling, Bangkok, Thailand, 5–7 February 2018; pp. 453–465. [Google Scholar]
- Hou, M.; Liu, R.; Fan, X.; Luo, Z. Joint residual learning for underwater image enhancement. In Proceedings of the 2018 IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 4043–4047. [Google Scholar]
- Sun, X.; Liu, L.; Li, Q.; Dong, J.; Lima, E.; Yin, R. Deep pixel-to-pixel network for underwater image enhancement and restoration. IET Image Process. 2019, 13, 469–474. [Google Scholar] [CrossRef]
- Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An Underwater Image Enhancement Benchmark Dataset and Beyond. IEEE Trans. Image Process. 2020, 29, 4376–4389. [Google Scholar] [CrossRef]
- Liu, S.; Fan, H.; Lin, S.; Wang, Q.; Ding, N.; Tang, Y. Adaptive Learning Attention Network for Underwater Image Enhancement. IEEE Robot. Autom. Lett. 2022, 7, 5326–5333. [Google Scholar] [CrossRef]
- Li, J.; Skinner, K.A.; Eustice, R.M.; Johnson-Roberson, M. Water GAN: Unsupervised generative network to enable real-time color correction of monocular underwater images. IEEE Robot. Autom. Lett. 2017, 3, 387–394. [Google Scholar]
- Fabbri, C.; Islam, J.; Sattar, J. Enhancing Underwater Imagery Using Generative Adversarial Networks. In Proceeding of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 7159–7165. [Google Scholar]
- Hasija, S.; Buragohain, M.J.; Indu, S. Fish Species Classification Using Graph Embedding Discriminant Analysis. In Proceedings of the 2017 International Conference on Machine Vision and Information Technology (CMVIT), Singapore, 17–19 February 2017; pp. 81–86. [Google Scholar] [CrossRef]
- Qiao, X.; Bao, J.; Zhang, H.; Wan, F.; Li, D. fvUnderwater sea cucumber identification based on Principal Component Analysis and Support Vector Machine. Measurement 2019, 133, 444–455. [Google Scholar] [CrossRef]
- Han, F.; Zhu, H.; Yao, J. Multi-Targets Real Time Detection from Underwater Vehicle Vision Via Deep Learning CNN Method. In Proceedings of the 29th International Ocean and Polar Engineering Conference, Honolulu, Hawaii, USA, 16 June 2019; p. 6. [Google Scholar]
- Peng, F.; Miao, Z.; Li, F.; Li, Z. S-FPN: A shortcut feature pyramid network for sea cucumber detection in underwater images. Expert Syst. Appl. 2021, 182, 115306. [Google Scholar] [CrossRef]
- Cao, S.; Zhao, D.; Liu, X.; Sun, Y. Real-time robust detector for underwater live crabs based on deep learning. Comput. Electron. Agric. 2020, 172, 105339. [Google Scholar] [CrossRef]
- Li, Y.; Guo, J.; Guo, X.; Zhao, J.; Yang, Y.; Hu, Z.; Jin, W.; Tian, Y. Toward in situ zooplankton detection with a densely connected YOLOV3 model. Appl. Ocean Res. 2021, 114, 102783. [Google Scholar] [CrossRef]
- Li, Y.; Guo, J.; Guo, X.; Hu, Z.; Tian, Y. Plankton Detection with Adversarial Learning and a Densely Connected Deep Learning Model for Class Imbalanced Distribution. J. Mar. Sci. Eng. 2021, 9, 636. [Google Scholar] [CrossRef]
- Li, X.; Shang, M.; Qin, H.; Chen, L. Fast accurate fish detection and recognition of underwater images with fast R-CNN. In Proceedings of the OCEANS 2015—MTS/IEEE Washington, Washington, DC, USA, 19–22 October 2015; pp. 1–5. [Google Scholar] [CrossRef]
- Li, X.; Shang, M.; Hao, J.; Yang, Z. Accelerating fish detection and recognition by sharing CNNs with objectness learning. In Proceedings of the OCEANS 2016—Shanghai, Shanghai, China, 10–13 April 2016; pp. 1–5. [Google Scholar] [CrossRef]
- Li, X.; Tang, Y.; Gao, T. Deep but lightweight neural networks for fish detection. In Proceedings of the OCEANS 2017—Aberdeen, Aberdeen, UK, 19–22 June 2017; pp. 1–5. [Google Scholar] [CrossRef]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; Volume 12346, pp. 213–229. [Google Scholar] [CrossRef]
- Mnih, V.; Heess, N.; Graves, A. Recurrent Models of Visual Attention. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2014; pp. 2204–2212. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. IEEE Trans Pattern Anal Mach Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; Volume 11211, pp. 3–19. [Google Scholar] [CrossRef]
- Park, J.; Woo, S.; Lee, J.-Y.; Kweon, I.S. BAM: Bottleneck Attention Module. arXiv 2018, arXiv:1807.06514. Available online: http://arxiv.org/abs/1807.06514 (accessed on 30 May 2022).
- Misra, D.; Nalamada, T.; Arasanipalai, A.U.; Hou, Q. Rotate to Attend: Convolutional Triplet Attention Module. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2021; pp. 3138–3147. [Google Scholar] [CrossRef]
- Wang, D.; He, D. Fusion of Mask RCNN and attention mechanism for instance segmentation of apples under complex background. Comput. Electron. Agric. 2022, 196, 106864. [Google Scholar] [CrossRef]
- Qi, J.; Liu, X.; Liu, K.; Xu, F.; Guo, H.; Tian, X.; Li, M.; Bao, Z.; Li, Y. An improved YOLOv5 model based on visual attention mechanism: Application to recognition of tomato virus disease. Comput. Electron. Agric. 2022, 194, 106780. [Google Scholar] [CrossRef]
- QWang, Q.; Cheng, M.; Huang, S.; Cai, Z.; Zhang, J.; Yuan, H. A deep learning approach incorporating YOLO v5 and attention mechanisms for field real-time detection of the invasive weed Solanum rostratum Dunal seedlings. Comput. Electron. Agric. 2022, 199, 107194. [Google Scholar] [CrossRef]
Model | AP(@0.5) | mAP(@0.5) | Ground Truth | |||
---|---|---|---|---|---|---|
Urchin | Sea Cucumber | Starfish | Scallop | |||
YOLOV5 (Raw) | 82.2% | 62.6% | 80.4% | 60.9% | 71.5% | 6207 |
YOLOV5 (En) | 91.3% | 62.9% | 86.2% | 71.5% | 78.0% | 6530 |
Model | AP(@0.5) | mAP(@0.5) | Ground Truth | |||
---|---|---|---|---|---|---|
Urchin | Sea Cucumber | Starfish | Scallop | |||
YOLOV5 (Raw) | 82.2% | 62.6% | 80.4% | 60.9% | 71.5% | 6207 |
YOLOV5-SE (Raw) | 83.8% | 63.7% | 81.2% | 56.7% | 71.4% | 6207 |
YOLOV5-CBAM (Raw) | 85.4% | 61.6% | 81.2% | 60.4% | 72.1% | 6207 |
YOLOV5-TA (Raw) | 83.5% | 61.9% | 82.7% | 63.2% | 72.8% | 6207 |
Model | AP(@0.5) | mAP(@0.5) | Ground Truth | |||
---|---|---|---|---|---|---|
Urchin | Sea Cucumber | Starfish | Scallop | |||
YOLOV5 (Raw) | 82.2% | 62.6% | 80.4% | 60.9% | 71.5% | 6207 |
YOLOV5-PH (Raw) | 85.5% | 63.6% | 82.6% | 60.1% | 72.9% | 6207 |
Model | AP(@0.5) | mAP (@0.5) | mAP (@0.5:0.95) | Detection Time | Ground Truth | |||
---|---|---|---|---|---|---|---|---|
Urchin | Sea Cucumber | Starfish | Scallop | |||||
YOLOV5 (Raw) | 82.2% | 62.6% | 80.4% | 60.9% | 71.5% | 35.1% | 0.248 | 6207 |
YOLOV5 (En) | 91.3% | 62.9% | 86.2% | 71.5% | 78.0% | 40.7% | 0.917 | 6530 |
YOLOV5-TA (En) | 92.0% | 65.6% | 87.1% | 73.1% | 79.4% | 41.1% | 0.924 | 6530 |
YOLOV5-PH (En) | 93.1% | 63.9% | 88.0% | 74.5% | 79.9% | 40.6% | 0.999 | 6530 |
Faster RCNN (resnet50) (En) | 73.0% | 31.0% | 57.1% | 44.1% | 51.3% | 20.7% | 6.817 | 6530 |
Faster RCNN (resnet101) (En) | 85.9% | 46.0% | 78.0% | 36.7% | 61.6% | 29.8% | 11.370 | 6530 |
Ours (YOLOV5-TA+PH) (En) | 93.4% | 69.8% | 89.3% | 79.7% | 83.1% | 42.2% | 0.982 | 6530 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Y.; Bai, X.; Xia, C. An Improved YOLOV5 Based on Triplet Attention and Prediction Head Optimization for Marine Organism Detection on Underwater Mobile Platforms. J. Mar. Sci. Eng. 2022, 10, 1230. https://doi.org/10.3390/jmse10091230
Li Y, Bai X, Xia C. An Improved YOLOV5 Based on Triplet Attention and Prediction Head Optimization for Marine Organism Detection on Underwater Mobile Platforms. Journal of Marine Science and Engineering. 2022; 10(9):1230. https://doi.org/10.3390/jmse10091230
Chicago/Turabian StyleLi, Yan, Xinying Bai, and Chunlei Xia. 2022. "An Improved YOLOV5 Based on Triplet Attention and Prediction Head Optimization for Marine Organism Detection on Underwater Mobile Platforms" Journal of Marine Science and Engineering 10, no. 9: 1230. https://doi.org/10.3390/jmse10091230
APA StyleLi, Y., Bai, X., & Xia, C. (2022). An Improved YOLOV5 Based on Triplet Attention and Prediction Head Optimization for Marine Organism Detection on Underwater Mobile Platforms. Journal of Marine Science and Engineering, 10(9), 1230. https://doi.org/10.3390/jmse10091230