BGLE-YOLO: A Lightweight Model for Underwater Bio-Detection
<p>Images (<b>a</b>,<b>b</b>) describe underwater image features marked by low contrast and small targets, respectively, and (<b>c</b>,<b>d</b>) describe underwater image features marked by underwater blurring and color deviations due to various attenuations, respectively.</p> "> Figure 2
<p>(<b>a</b>) The diagram illustrates the architecture of YOLOv8. (<b>b</b>) The diagram depicts the architecture of BGLE-YOLO. Compared to (<b>a</b>), (<b>b</b>) adds BiFPN network, GLSA attention block, EMC convolution, and LSH detection header to (<b>a</b>).</p> "> Figure 3
<p>The structures of EMC are presented. The input feature map is channelized and then fused into an output feature map by independent multi-channel features.</p> "> Figure 4
<p>The structures of FPN, PANet, NAS-FPN, and BiFPN.</p> "> Figure 5
<p>Introduction to the Global-to-Local Spatial Aggregation (GLSA) module.</p> "> Figure 6
<p>Convolution in LSH consists of the group normalized GN convolution and the group normalized detail-enhanced convolution DEConv. The red pixels are normalized using the same mean and variance, which are calculated by combining the values of these pixels.</p> "> Figure 7
<p>Comparison plots of the YOLO family of algorithms on the DUO dataset. (<b>a</b>) Accuracy comparison plot; (<b>b</b>) mAP@0.5 comparison plot; (<b>c</b>) mAP@0.5:0.95 comparison plot.</p> "> Figure 8
<p>Comparison plots of the YOLO family of algorithms on the RUOD dataset. (<b>a</b>) Accuracy comparison plot; (<b>b</b>) mAP@0.5 comparison plot; (<b>c</b>) mAP@0.5:0.95 comparison plot.</p> "> Figure 9
<p>Comparison of parameters as well as computational effort of different models.</p> "> Figure 10
<p>Qualitative comparison of the detection performance of the YOLO series of models, (<b>a</b>–<b>c</b>) showing the detection results for four categories in the DUO dataset.</p> "> Figure 11
<p>Qualitative comparison of the detection performance of the YOLO series of models, (<b>a</b>–<b>c</b>) showing the detection results for four categories in the RUOD dataset.</p> ">
Abstract
:1. Introduction
- To achieve a lighter YOLOv8 backbone network, the parameter count and computational load are designed to be lower than the original 3 × 3 convolution kernel for EMC convolution. Inheriting the grouping idea of Group Convolution, multi-scale feature information is extracted more efficiently by the backbone network during the feature extraction process.
- Introducing the BIG module. The BIG module can reduce the error information generated in the high-level features as the detection depth increases. In the neck network, its local spatial detail information and global spatial semantic information extracted from the backbone part are able to perform fast and efficient multi-scale feature fusion.
- The LSH module is introduced into the architecture because it references the fact that shared convolution can drastically lower the parameter count and enables the use of a Scale layer to scale the features when dealing with the problem of inconsistent underwater target scales detected by each detection head. Detail-enhanced convolution and group normalization are also designed to improve its detail-capturing ability and minimize the accuracy loss while keeping the number of detection head parameters and computational effort smaller.
2. Materials and Methods
2.1. Efficient Multi-Scale Convolution EMC
2.2. BIG
2.2.1. GSA Module
2.2.2. LSA Module
2.3. LSH
2.3.1. Group Normalization
2.3.2. Detail Enhancement Convolution
3. Experiments and Discussion
3.1. Experimental Environment and Configuration
3.2. Evaluation Metrics
3.3. Ablation Experiments
3.3.1. Prediction Problem of Group Normalization in LSH
3.3.2. Performance Comparison of DEConv at Different Positions in LSH
3.3.3. Performance Comparison of BGLE-YOLO Components on the DUO and RUOD Datasets
3.4. Comparative Experiment
3.4.1. Compared with the Traditional Lightweight YOLO Series
3.4.2. A Brief Comparison to Other Family Models
3.5. Visualization of Model Detection Effects on the DUO Dataset
3.6. Visualization of Model Detection Effects on the RUOD Dataset
4. Discussion
4.1. Findings
4.2. Limitations and Future Works
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhang, R.; Gao, Q.; Gao, K. Impact of marine industrial agglomeration on the high-quality development of the marine economy—A case study of China’s coastal areas. Ecol. Indic. 2024, 158, 111410. [Google Scholar] [CrossRef]
- Chen, X.; Fan, C.; Shi, J.; Wang, H.; Yao, H. Underwater target detection and embedded deployment based on lightweight YOLO_GN. J. Supercomput. 2024, 80, 14057–14084. [Google Scholar] [CrossRef]
- Xie, X.; Cheng, G.; Wang, J.; Yao, X.; Han, J. Oriented R-CNN for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 3520–3529. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. ISBN 978-3-319-46448-0. [Google Scholar]
- Wang, G.; Chen, Y.; An, P.; Hong, H.; Hu, J.; Huang, T. UAV-YOLOv8: A small-object-detection model based on improved YOLOv8 for UAV aerial photography scenarios. Sensors 2023, 23, 7190. [Google Scholar] [CrossRef]
- Zhang, C.; Zhang, G.; Li, H.; Liu, H.; Tan, J.; Xue, X. Underwater target detection algorithm based on improved YOLOv4 with SemiDSConv and FIoU loss function. Front. Mar. Sci. 2023, 10, 1153416. [Google Scholar] [CrossRef]
- Liu, P.; Qian, W.; Wang, Y. Ywnet: A convolutional block attention-based fusion deep learning method for complex underwater small target detection. Eco. Inform. 2024, 79, 102401. [Google Scholar] [CrossRef]
- Yeh, C.H.; Lin, C.H.; Kang, L.W.; Huang, C.H.; Lin, M.H.; Chang, C.Y.; Wang, C.C. Lightweight deep neural network for joint learning of underwater object detection and color conversion. IEEE Trans. Neural Networks Learn. Syst. 2021, 33, 6129–6143. [Google Scholar] [CrossRef]
- Xu, G.; Zhou, D.; Yuan, L.; Guo, W.; Huang, Z.; Zhang, Y. Vision-based underwater target real-time detection for autonomous underwater vehicle subsea exploration. Front. Mar. Sci. 2023, 10, 1112310. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Tang, F.; Xu, Z.; Huang, Q.; Wang, J.; Hou, X.; Su, J.; Liu, J. DuAT: Dual-aggregation transformer network for medical image segmentation. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Xiamen, China, 13–15 October 2023; Springer Nature: Singapore, 2023; pp. 343–356. [Google Scholar]
- Chen, J.; Er, M.J. Dynamic YOLO for small underwater object detection. Artif. Intell. Rev. 2024, 57, 1–23. [Google Scholar] [CrossRef]
- Wang, W.; Dai, J.; Chen, Z.; Huang, Z.; Li, Z.; Zhu, X.; Hu, X.; Lu, T.; Li, H.; Wang, X.; et al. Internimage: Exploring large-scale vision foundation models with deformable convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 14408–14419. [Google Scholar]
- Chen, Z.; He, Z.; Lu, Z.M. DEA-Net: Single image dehazing based on detail-enhanced convolution and content-guided attention. IEEE Trans. Image Process. 2024, 33, 1002–1015. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Chen, F.; Li, S.; Han, J.; Ren, F.; Yang, Z. Review of lightweight deep convolutional neural networks. Arch. Comput. Methods Eng. 2024, 31, 1915–1937. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
- Howard, A.G. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Kong, T.; Sun, F.; Huang, W.; Liu, H.; Huang, W. Deep feature pyramid reconfiguration for object detection. arXiv 2018, arXiv:1808.07993. [Google Scholar]
- Lin, T.-Y.; Doll’ar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Elfwing, S.; Uchibe, E.; Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 2018, 107, 3–11. [Google Scholar] [CrossRef]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
- Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Zhou, Y.; Wang, F.; Zhao, J.; Yao, R.; Chen, S.; Ma, H. Spatial-temporal based multihead self-attention for remote sensing image change detection. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 6615–6626. [Google Scholar] [CrossRef]
- Bello, I.; Zoph, B.; Vaswani, A.; Shlens, J.; Le, Q.V. Attention augmented convolutional networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3286–3295. [Google Scholar]
- Liu, K.; Peng, L.; Tang, S. Underwater object detection using TC-YOLO with attention mechanisms. Sensors 2023, 23, 2567. [Google Scholar] [CrossRef]
- Li, G.; Lin, Y.; Ouyang, D.; Li, S.; Luo, X.; Qu, X. A RGB-thermal image segmentation method based on parameter sharing and attention fusion for safe autonomous driving. IEEE Trans. Intell. Transp. Syst. 2023, 25, 5122–5137. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Doll’ar, P. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 42, 2980–2988. [Google Scholar]
- Ioffe, S. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
- Tan, R.T. Visibility in bad weather from a single image. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
- Sobel, I.; Feldman, G. A 3×3 Isotropic Gradient Operator for Image Processing. Available online: https://www.researchgate.net/publication/285159837_A_33_isotropic_gradient_operator_for_image_processing (accessed on 2 March 2025).
- Liu, C.; Li, H.; Wang, S.; Zhu, M.; Wang, D.; Fan, X. A dataset and benchmark of underwater object detection for robot picking. In Proceedings of the 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Shenzhen, China, 5–9 July 2021; pp. 1–6. [Google Scholar]
- Fu, C.; Liu, R.; Fan, X.; Chen, P.; Fu, H.; Yuan, W.; Zhu, M.; Luo, Z. Rethinking general underwater object detection: Datasets, challenges, and solutions. Neurocomputing 2023, 517, 243–256. [Google Scholar] [CrossRef]
- Ultralytics. Yolov5. Available online: https://github.com/ultralytics/yolov5 (accessed on 1 November 2021).
- Wang, C.; Bochkovskiy, A.; Liao, H.M. yolov7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Jocher, G.; Qiu, J.; Chaurasia, A. Ultralytics YOLO (Version 8.0.0) [Computer Software]. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 2 March 2025).
- Lyu, C.; Zhang, W.; Huang, H.; Zhou, Y.; Wang, Y.; Liu, Y.; Zhang, S.; Chen, K. Rtmdet: An empirical study of designing real-time object detectors. arXiv 2022, arXiv:2212.07784. [Google Scholar]
- Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024. [Google Scholar]
Training Parameters | Values |
---|---|
Image size | 640 × 640 |
Epochs | 300 |
Batch | 32 |
Workers | 8 |
Learning rate | 0.01 |
Optimizer | SGD |
Cache | False |
Weight decay factor | 0.0005 |
Methods | [email protected] (%) | [email protected]:0.95 (%) |
---|---|---|
YOLOv8n | 84.2 | 65.0 |
BGLE (w/o GN and DEConv) | 83.5 | 64.9 |
BGLE (w/GN w/o DEConv) | 83.7 | 64.9 |
BGLE (w/o GN w/DEConv) | 83.9 | 64.6 |
BGLE (w/GN and DEConv) | 84.2 | 65.0 |
Methods | [email protected] (%) | [email protected]:0.95 (%) |
---|---|---|
YOLOv8n | 84.4 | 60.8 |
BGLE (w/o GN and DEConv) | 82.0 | 56.2 |
BGLE (w/GN w/o DEConv) | 83.4 | 58.6 |
BGLE (w/o GN w/DEConv) | 84.0 | 59.6 |
BGLE (w/GN and DEConv) | 84.1 | 59.6 |
GN | LPC | RPC | [email protected] (%) | [email protected]:0.95 (%) | Params (MB) | GFLOPs |
---|---|---|---|---|---|---|
84.2 | 65.0 | 3.0 | 8.1 | |||
√ | 82.7 | 64.3 | 1.63 | 6.2 | ||
√ | √ | 83.9 | 65.1 | 1.72 | 6.7 | |
√ | √ | 84.2 | 65.0 | 1.63 | 6.2 | |
√ | √ | √ | 83.4 | 64.7 | 1.72 | 6.7 |
GN | LPC | RPC | [email protected] (%) | [email protected]:0.95 (%) | Params (MB) | GFLOPs |
---|---|---|---|---|---|---|
84.4 | 60.8 | 3.0 | 8.1 | |||
√ | 84.0 | 59.6 | 1.63 | 6.2 | ||
√ | √ | 83.9 | 59.5 | 1.72 | 6.7 | |
√ | √ | 84.1 | 59.6 | 1.63 | 6.2 | |
√ | √ | √ | 83.8 | 59.3 | 1.72 | 6.7 |
Module | Precision (%) | [email protected] (%) | [email protected]:0.95 (%) | GFLOPs | Params (MB) | ||
---|---|---|---|---|---|---|---|
EMC | BIG | LSH | |||||
82.6 | 84.2 | 65.0 | 8.1 | 3.0 | |||
√ | 83.0 | 84.4 | 65.5 | 7.6 | 2.72 | ||
√ | 86.2 | 84.8 | 66.1 | 7.6 | 2.14 | ||
√ | 83.4 | 84.2 | 64.9 | 6.5 | 2.36 | ||
√ | √ | 83.5 | 84.3 | 65.7 | 7.3 | 2.0 | |
√ | √ | 85.3 | 83.9 | 65.1 | 6.4 | 1.77 | |
√ | √ | √ | 86.0 | 84.2 | 65.0 | 6.2 | 1.63 |
Module | Precision (%) | [email protected] (%) | [email protected]:0.95 (%) | GFLOPs | Params (MB) | ||
---|---|---|---|---|---|---|---|
EMC | BIG | LSH | |||||
85.2 | 84.4 | 60.8 | 8.1 | 3.0 | |||
√ | 85.4 | 84.4 | 60.8 | 7.6 | 2.72 | ||
√ | 85.2 | 84.3 | 60.4 | 7.6 | 2.14 | ||
√ | 85.4 | 84.5 | 60.3 | 6.5 | 2.36 | ||
√ | √ | 85.3 | 84.2 | 60.1 | 7.3 | 2.0 | |
√ | √ | 84.6 | 84.0 | 59.5 | 6.4 | 1.77 | |
√ | √ | √ | 85.1 | 84.1 | 59.6 | 6.2 | 1.63 |
Models | Precision (%) | [email protected] (%) | [email protected]:0.95 (%) | Params (MB) | GFLOPs |
---|---|---|---|---|---|
YOLOv5n [36] | 84.4 | 82.4 | 59.8 | 1.7 | 4.1 |
YOLOv7tiny [37] | 86.3 | 85.1 | 63.2 | 6.0 | 13 |
YOLOv8n | 82.6 | 84.2 | 65.0 | 3.0 | 8.1 |
YOLOv10n [38] | 83.7 | 83.0 | 63.7 | 2.7 | 8.2 |
YOLOv11n [39] | 83.7 | 83.6 | 64.9 | 2.6 | 6.3 |
BGLE-YOLO | 86.0 | 84.2 | 65.0 | 1.6 | 6.2 |
Models | Precision (%) | [email protected] (%) | [email protected]:0.95 (%) | Params (MB) | GFLOPs |
---|---|---|---|---|---|
YOLOv5n | 85.8 | 83.2 | 53.9 | 1.7 | 4.2 |
YOLOv7-tiny | 85.0 | 85.2 | 57.9 | 6.0 | 13.1 |
YOLOv8n | 85.2 | 84.4 | 60.8 | 3.0 | 8.1 |
YOLOv10n | 85.0 | 83.9 | 59.9 | 2.7 | 8.2 |
YOLOv11n | 85.4 | 84.7 | 61.4 | 2.6 | 6.3 |
BGLE-YOLO | 85.1 | 84.1 | 59.6 | 1.6 | 6.2 |
Models | [email protected] (%) | [email protected]:0.95 (%) | Params (MB) | GFLOPs |
---|---|---|---|---|
RTMD-Tiny [40] | 85.6 | 66.5 | 4.8 | 8.1 |
DETR-R50 [41] | 84.1 | 62.8 | 41.5 | 96.5 |
YOLOv8 | 84.2 | 65.0 | 3.0 | 8.1 |
BGLE-YOLO | 84.2 | 65.0 | 1.6 | 6.2 |
Models | [email protected] (%) | [email protected]:0.95 (%) | Params (MB) | GFLOPs |
---|---|---|---|---|
RTMD-Tiny | 85.8 | 62.4 | 4.88 | 8 |
DETR-R50 | 85.5 | 59.4 | 41.55 | 91.7 |
YOLOv8 | 84.4 | 60.8 | 3.0 | 8.1 |
BGLE-YOLO | 84.1 | 59.6 | 1.63 | 6.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, H.; Xu, C.; Chen, J.; Zhang, Z.; Wang, X. BGLE-YOLO: A Lightweight Model for Underwater Bio-Detection. Sensors 2025, 25, 1595. https://doi.org/10.3390/s25051595
Zhao H, Xu C, Chen J, Zhang Z, Wang X. BGLE-YOLO: A Lightweight Model for Underwater Bio-Detection. Sensors. 2025; 25(5):1595. https://doi.org/10.3390/s25051595
Chicago/Turabian StyleZhao, Hua, Chao Xu, Jiaxing Chen, Zhexian Zhang, and Xiang Wang. 2025. "BGLE-YOLO: A Lightweight Model for Underwater Bio-Detection" Sensors 25, no. 5: 1595. https://doi.org/10.3390/s25051595
APA StyleZhao, H., Xu, C., Chen, J., Zhang, Z., & Wang, X. (2025). BGLE-YOLO: A Lightweight Model for Underwater Bio-Detection. Sensors, 25(5), 1595. https://doi.org/10.3390/s25051595